Reputation: 492

use Geopandas plot missing values

My shapefile has some missing values (represented by nan) on certain columns (for example, GDP). When plotting without dealing with those missing values, the legend shows like this:

enter image description here

which is not what I want. So, I replace the missing values with a string "missing", then redo the plotting. Not surprisingly, I got the error message saying that TypeError: '<' not supported between instances of 'str' and 'float'.

My questions are: 1. how does Geopandas treat missing values? Does it store the missing values in a string or some other types of data? 2. How can I keep those missing values and redo the plotting with the legend label show the missingness?

Upvotes: 5

Answers (3)

steven

Reputation: 2519

import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import pysal.viz.mapclassify as mc
from matplotlib.colors import rgb2hex
from matplotlib.colors import ListedColormap
plt.style.use('seaborn')

gdf = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
# generate random data
gdf['random'] = np.random.normal(100, 10, len(gdf))
# assign missing values
gdf.loc[np.random.choice(gdf.index, 40), 'random'] = np.nan

The basic idea here is to create a category/string column based on the categorization method (e.g., quantiles, percentiles, etc.) you want to use for your numberical data. After that, we plot that string column so that we can pass a customized colormap (with a grey color to represent missing values).

# categorize the numerical column
k = 5
quantiles = mc.Quantiles(gdf.random.dropna(), k=k)
gdf['random_cat'] = quantiles.find_bin(gdf.random).astype('str')

gdf.loc[gdf.random.isnull(), 'random_cat'] = 'No Data'

# add grey to a colormap to represent missing value
cmap = plt.cm.get_cmap('Blues', k)
cmap_list = [rgb2hex(cmap(i)) for i in range(cmap.N)]
cmap_list.append('grey')
cmap_with_grey = ListedColormap(cmap_list)

# plot map
fig, ax = plt.subplots(figsize=(12, 10))
gdf.plot(column='random_cat', edgecolor='k', cmap=cmap_with_grey,
         legend=True, legend_kwds=dict(loc='center left'),
         ax=ax)

# get all upper bounds in the quantiles category
upper_bounds = quantiles.bins
# get and format all bounds
bounds = []
for index, upper_bound in enumerate(upper_bounds):
    if index == 0:
        lower_bound = gdf.random.min()
    else:
        lower_bound = upper_bounds[index-1]

    bound = f'{lower_bound:.2f} - {upper_bound:.2f}'
    bounds.append(bound)

# get all the legend labels
legend_labels = ax.get_legend().get_texts()
# replace the numerical legend labels
for bound, legend_label in zip(bounds, legend_labels):
    legend_label.set_text(bound)

You may want to take a look at the following posts:

format/round numerical legend label in GeoPandas

Extract matplotlib colormap in hex-format

Matplotlib.colors.ListedColormap in python

Change main plot legend label text

Update as of geopandas 0.8.1:

You can now simply pass a missing_kwds arg in the plot function:

fig, ax = plt.subplots(figsize=(12, 10))

missing_kwds = dict(color='grey', label='No Data')

gdf.plot(column='random', scheme='Quantiles', k= 5,
         legend=True, legend_kwds=dict(loc='center left'),
         missing_kwds=missing_kwds, ax=ax)

Upvotes: 8

CPBL

Reputation: 4030

Update: New feature in geopandas solves your problem: You can leave your missing values as NaN and use:

ax = gdf.plot( <other arguments>, 
       missing_kwds = dict(color = "lightgrey",) )

To make all missing data regions light grey.

See https://geopandas.readthedocs.io/en/latest/mapping.html (actually, the documentation may say that the parameter is missing_kwdsdict, but the above is what works for me)

Upvotes: 2

martinfleis

Reputation: 7814

GeoPandas does not support plotting missing values at this moment. This is planned for 0.7 release. Possible solution is to plot only those rows without missing values and then plot only missing values. As you did not give us any code, below is an example from https://nbviewer.jupyter.org/gist/jorisvandenbossche/bb1cc71f94aa3e8f2832f18dd12f6174

import geopandas

gdf = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))

# Introduce some missing values:
gdf.loc[np.random.choice(gdf.index, 20), 'pop_est'] = np.nan

ax = gdf[gdf.pop_est.notna()].plot(column='pop_est', figsize=(15, 10), legend=True)
gdf[gdf.pop_est.isna()].plot(color='lightgrey', hatch='///', ax=ax)

Upvotes: 1

use Geopandas plot missing values

Answers (3)

Related Questions