Matplotlib Tutorial-Part2

Htoo Latt
3 min readDec 16, 2021

In my previous blog, I went through the basics of matplot lib and how to change the style of the graphs. I also went over how to plot a bar graph and when it should be used. In this post, I will go over several more plot types and show examples of them in use.

Histogram

A histogram and a bar graph look very similar but there is a distinct difference. Histograms are used to show the distribution of the data, unlike a bar chart that compares different entities. Histograms are useful for arrays or a large list of data.

An example of a type of data that could be plotted using a histogram is the age of the population. First, we would need to divide all the data points into bins. The bins refer to the range of ages that are divided into a series of intervals. Usually, bins are all the same ranges but it doesn’t have to be if the data points are extremely concentrated in one area. The shape of the histogram can be interpreted the same way you would interpret it as a distribution. It can have a bell-shaped curve that indicates a normal distribution, it can be skewed towards one side, it can be bimodal and have two peaks, and etc.

Below, I have created bins in the interval of 10s and plotted out a distribution of a random list of 100 ages ranging from 0 to 100.

import random
import matplotlib.pyplot as plt


#Generate 100 random numbers between 0 and 100
population_age = random.sample(range(-1, 101), 100)
print(randomlist)

bins = [0,10,20,30,40,50,60,70,80,90,100]
plt.hist(population_age, bins, histtype='bar', color='skyblue' , rwidth=0.8)
plt.xlabel('Age Groups')
plt.ylabel('Number of people')
plt.title('Histogram')
plt.show()

Scatter Plot

A scatter plot is usually used to show the relationships between independent and dependent variables, for example, showing how two variables are correlated. The data is shown as a collection of points, with the value of one variable shown on the horizontal axis and another on the vertical axis. A best-fit line can be inserted to find a trend in the data.

import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('AmesHousing.csv')

fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(x = df['Gr Liv Area'], y = df['SalePrice'])
plt.xlabel("Living Area Above Ground")
plt.ylabel("House Price")

plt.show()

The above plot shows the price of houses relative to the living area above ground. You can see that a best-fit line would have an upward trending line that is slightly curved upwards. You can also see a few outliers that don’t follow the trend in the bottom right corner. A scatter plot can be extremely useful when trying to gauge the relationship between two variables. Two variables that don’t have a relationship would simply show a graph with points that are randomly spread out. A scatter plot with two negatively correlated variables would show points where the best fit line would be going downwards.

--

--