SQL Part 10

Htoo Latt
3 min readMar 29, 2022

In the last blog post, I left off using the DISTINCT statement along with aggregation. One thing to keep in mind with using the DISTINCT statement is that when they are used in conjunction with aggregate functions, they tend to slow the query time by a large amount.

In today’s blog post, I will be introducing JOINS and how they can be used. We mostly have been working with a single table in our previous blog posts. A large reason why SQL is so commonly used is that it allows us to work with data from different tables at the same time.

Before we talked about JOINs, we first need to about the schema that the database has. In a relational database, each table is related to one another. Meaning that they shared common identifiers that allow the data from several tables to be combined easily and logically.

To have a better understanding of JOINs let us see them in action. For this exercise let’s use a new set of data. The new dataset contains information on movies and users’ ratings for them. Our dataset contains three tables with different sorts of information.

Let’s take a look at each table first.

SELECT * FROM links

This table contains the movie id used to identify the movies in our dataset and the movie unique identifier codes on two different movie database websites IMDB and TMDB. Let’s take a look at another table from the dataset.

SELECT * FROM movies

This table contains three columns. The first is the movie id that was also found in the first table, the second column contains the movie title and finally, the last column contains the genres of the movie. We can see that the genre column can contain several different types which are split by a vertical bar. Since this is a table containing the movies only, there are no repeating values for the movie titles and the movie id. The table contains 193609 rows of movies. Now let’s take a look at the final table.

SELECT * FROM ratings.

This table contains four columns. The first is a unique identifier for each user, the second column contains the movie identifiers that were also found in the previous two tables, the third column contains the rating that was given by the user to the movie and finally the fourth column contains a timestamp of when the rating was given.

Now let’s try joining a table

SELECT *
FROM movies
JOIN links
ON movies.movieId = links.movieId

In the above query, I selected all the data from the tables movies and links. I specified that the two tables should be joined on the unique identifiers moviedId which exists in both tables.

--

--