SQL Part 7

Htoo Latt
3 min readFeb 7, 2022

In this blog post, I will be talking about several extremely useful functions that exist in SQL.

GROUP BY

The first of these is the GROUP BY function. The aggregate functions that we went through in the previous blogs like COUNT, AVG, and SUM aggregate across the entire dataset. If your goal is to aggregate only a partial amount of the data, you would need to use the GROUP BY function. For example, using my ramenratings dataset, if I wanted to COUNT the amount of ramen separated by countries, I would need to use GROUP BY.

The query would look like below

SELECT country,
COUNT(*)
FROM ramenratings
GROUP BY 1
ORDER BY 2 DESC

The query above returns a table containing a column with a list of countries in the dataset and another column containing the number of entries for each country. It is then sorted by the number of ramens in descending order.

It is also possible to aggregate the data by multiple columns.

SELECT country,
style,
COUNT(*)
FROM ramenratings
GROUP BY 1, 2
ORDER BY 1, 3 DESC

The query above returns a table containing a count of the number of ramens separated by style and country. I sorted the table so that the countries are placed together.

SQL CASE

The CASE statement is SQL version if/ then logic. The CASE statement is followed by pairs of WHEN and THEN statements. The CASE statement ends with an END statement. ELSE statement could also be used if you desire a value to be placed into the column if the CASE statement is not met. Let us use an example to show the CASE statement in action.

SELECT brand,
variety,
CASE WHEN country = ‘japan’ THEN ‘yes’
ELSE ‘no’ END as ramen_from_japan
FROM ramen_ratings

Let’s go through what is happening in the query above.

First, the CASE statement goes through each row and checks whether the conditional statement of country = ‘japan’ is true or not.

Second, for any row where the conditional statement is true, the word ‘yes’ gets placed into a new column created by the query that I have named ‘ramen_from_japan’.

Third, for any row where the conditional statement is not true, the word ‘no’ gets placed into the ‘ramen_from_japan’ column.

Fourth, it also returns the brand and variety columns for each row.

From now on let’s use a new dataset containing the top 1000 highest-grossing Hollywood films. The dataset contains the gross for US domestic, International, and the combination of the two called Worldwide.

Other than the gross, the dataset also contains the film's distributors, release date, rating, genre, runtime, and also a short description of the movie.

Let’s try using the CASE statement on the new dataset.

SELECT title,
distributor,
CASE WHEN ‘world_sales_(in_$)’ > 1000000000 THEN ‘More than a
billion'
ELSE ‘Less than a billion’ END as billion_benchmark
FROM HighestHolywoodGrossingMovies

The query selects the title, distributor and also creates a new column that tells us if the movie grossed more than a billion us dollars. Noticed that the column name for gross had to be in quotes.

--

--