Introduction:
As part of my Capstone Project "Visualizing Migration in Canada", where I scraped data from Statistics Canada to generate multiple plots (bar, scatter, time series, choropleth map) visualizing migration and settlement patterns in Canada, I used the matplotlib and GeoPandas libraries to create a choropleth map visualizing the growth rate of the non-permanent residents population in Canada by province and territory from the third quarter of 2022 up to the third quarter of 2024, a period where Canada experienced record-high non-permanent migration due to COVID-19-related labour shortages.
Methodology:
Steps taken:
Data of total non-permanent resident population in Canada broken down by subnational entities scraped from Statistics Canada, with data pre-processed in MS Excel (source: https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710012101&pickMembers%5B0%5D=1.1&cubeTimeFrame.startMonth=01&cubeTimeFrame.startYear=2021&cubeTimeFrame.endMonth=10&cubeTimeFrame.endYear=2024&referencePeriods=20210101%2C20241001)
Pandas operations applied to filter for data for Q3 2022 and Q3 2024.
Pandas arithmetic operation used to calculate growth rate for each province and territory's respective non-permanent resident population: ([[NPR Population, Q3 2024]-[NPR Population, Q3 2022]]/[NPR Population, Q3 2022])*100
Growth rate and locations stored in new dataframe, ensuring that the column name associated with the provinces and territories matches with the column name of the provinces and territories relevant shapefile from Statistics Canada's "2021 Census – Boundary files" (https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/index2021-eng.cfm?year=21)
Upload shapefile in Jupyter Notebook, then merge the dataframe with shapefile
Plot the merged data
Code Snippet for Geospatial Portion of the Project:
# Python code snippet to produce map #
############################################
## Install and import necessary libraries ##
############################################
!pip install geoplot
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from google.colab import files
import zipfile
import io
######################
## Import csv files ##
######################
NPR_ProvTerr = pd.read_csv("Non-Permanent Migration.csv") # Total Number of Non-Permanent Residents by Province and Territory
##########################################################
## Non-Permanent Residents Growth Rate (Q3 2022-Q3 2024) ##
##########################################################
# The growth of NPRs accelerated starting at the end ot 2022
# Let's map out its growth starting from Q3 2022 (July 2022)
NPR22_24 = NPR_ProvTerr[(NPR_ProvTerr['QUARTER'] == 'Q3 2024') | (NPR_ProvTerr['QUARTER'] == 'Q3 2022')]
NPR22_24 = NPR22_24.drop(['QUARTER'], axis=1)
NPR22_24
# Growth rate = [# of NPRs in Q3 2024 - # of NPRs in Q3 2022]/[# of NPRs in Q3 2022] multiplied by 100
NPR_Growth2=((NPR22_24.iloc[1]-NPR22_24.iloc[0])/NPR22_24.iloc[0])*100
NPR_Growth1=['Canada', 'Newfoundland and Labrador', 'Prince Edward Island', 'Nova Scotia', 'New Brunswick', 'Quebec', 'Ontario',
'Manitoba', 'Saskatchewan', 'Alberta', 'British Columbia', 'Yukon', 'Northwest Territories', 'Nunavut']
# The name "PRENAME" was selected to match with the geographic column name of the shapefile
# that will be shown in a later step
NPR_Growth=pd.DataFrame({'PRENAME': NPR_Growth1, 'Growth Rate': NPR_Growth2})
NPR_Growth.drop(NPR_Growth.index[0], inplace=True)
NPR_Growth
uploaded = files.upload()
# Upload zip file associated with shapefile corresponding to 2021 Census boundaries
# Then extract all files to retrieve shapefile
zf = zipfile.ZipFile(io.BytesIO(uploaded['lpr_000b21a_e.zip']), "r")
zf.extractall()
Canada = gpd.read_file("lpr_000b21a_e.shp")
# To check the column name corresponding to the names of the provinces and territories
# It is indeed "PRENAME"
Canada.head()
# Perform a join on the shapefile and NPR dataset
# The column the join is being performed on (i.e., PRENAME") must have the EXACT SAME ORDER
# in both the shapefile and dataset
CanMap = Canada.merge(NPR_Growth, on="PRENAME")
CanMap.head()
CanMap.plot(column="Growth Rate", cmap="Blues", legend=True,
figsize=(12, 12))
plt.title("Growth Rate of Non-Permanent Residents (Q3 2022-Q3 2024)")
plt.show()
Discussion:
Nunavut by far experienced the fastest non-permanent resident growth within that time period, although that can largely be attributed to its small non-permanent resident. Alberta also experienced high growth, mirroring its generally high growth as a whole over the past decade. Growth has slowed down for British Columbia and Ontario, reflecting the fact that the presence of a sizable non-permanent resident population is not a new phenomenon for both provinces. Surprisingly, the Maritime provinces experience relatively little growth despite government initiatives to attract immigrants and foreign labour into that region.