ankien
ankien

Reputation: 25

How to fill locations of shapefile based on CSV data set?

I'm using GeoPandas in Python to create a heatmap of the state of Florida from a given CSV dataset and a shapefile of Florida:

enter image description here

This is the code I have for displaying the state from the shapefile, and the CSV dataset contents (It's a list of Covid cases in the Florida counties),

The shapefile also conveniently has data for the the name of the counties along with their respective polygons:

enter image description here

My plan is to parse through the CSV and keep track of how many cases there are for each county then build a heatmap from that, but I'm unsure of how to work with shapefiles.

Upvotes: 0

Views: 654

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31166

  • I believe I have found same shape file you are working with. I don't know the source of your COVID data so have used NY Times data (https://github.com/nytimes/covid-19-data)
  • the normal terminology for a map based heatmap is a choropleth
  • it's a case of lining up your geometry with your COVID data
    • COVID data is by county, geometry by sub-county. Hence have rolled geometry up to county to make it consistent
    • geometry encodes FIPS in two columns, have created a new column with it combined. COVID data has FIPS as a float. Have modified this to be a string
    • now it's a simple case of a pandas merge() to combine / join geometry and COVID data
  • finally generating the visual. This is a simple case of generating a choropleth. https://geopandas.org/en/stable/docs/user_guide/mapping.html
import geopandas as gpd
import pandas as pd

gdf = gpd.read_file(
    "https://www2.census.gov/geo/tiger/TIGER2016/COUSUB/tl_2016_12_cousub.zip"
)

# NY Times data is by county not sub-county.  rollup geometry to county
gdf_county = (
    gdf.dissolve("COUNTYFP")
    .reset_index()
    .assign(fips=lambda d: d["STATEFP"] + d["COUNTYFP"])
    .loc[:, ["fips", "geometry", "STATEFP", "COUNTYFP", "NAME"]]
)

# get NY times data by county
df = pd.read_csv(
    "https://raw.githubusercontent.com/nytimes/covid-19-data/master/live/us-counties.csv"
)
# limit to florida and make fips same type as geometry
df_fl = (
    df.loc[df["state"].eq("Florida")]
    .dropna(subset=["fips"])
    .assign(fips=lambda d: d["fips"].astype(int).astype(str))
)

# merge geometry and covid data
gdf_fl_covid = gdf_county.merge(df_fl, on="fips")

# interactive folium choropleth
gdf_fl_covid.explore(column="cases")
# static matplotlib choropleth
gdf_fl_covid.plot(column="cases")

enter image description here

Upvotes: 1

Related Questions