Reputation: 492
My shapefile has some missing values (represented by nan
) on certain columns (for example, GDP). When plotting without dealing with those missing values, the legend shows like this:
which is not what I want.
So, I replace the missing values with a string "missing", then redo the plotting. Not surprisingly, I got the error message saying that TypeError: '<' not supported between instances of 'str' and 'float'
.
My questions are: 1. how does Geopandas treat missing values? Does it store the missing values in a string or some other types of data? 2. How can I keep those missing values and redo the plotting with the legend label show the missingness?
Upvotes: 5
Views: 5092
Reputation: 2519
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import pysal.viz.mapclassify as mc
from matplotlib.colors import rgb2hex
from matplotlib.colors import ListedColormap
plt.style.use('seaborn')
gdf = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
# generate random data
gdf['random'] = np.random.normal(100, 10, len(gdf))
# assign missing values
gdf.loc[np.random.choice(gdf.index, 40), 'random'] = np.nan
The basic idea here is to create a category/string column based on the categorization method (e.g., quantiles, percentiles, etc.) you want to use for your numberical data. After that, we plot that string column so that we can pass a customized colormap (with a grey color to represent missing values).
# categorize the numerical column
k = 5
quantiles = mc.Quantiles(gdf.random.dropna(), k=k)
gdf['random_cat'] = quantiles.find_bin(gdf.random).astype('str')
gdf.loc[gdf.random.isnull(), 'random_cat'] = 'No Data'
# add grey to a colormap to represent missing value
cmap = plt.cm.get_cmap('Blues', k)
cmap_list = [rgb2hex(cmap(i)) for i in range(cmap.N)]
cmap_list.append('grey')
cmap_with_grey = ListedColormap(cmap_list)
# plot map
fig, ax = plt.subplots(figsize=(12, 10))
gdf.plot(column='random_cat', edgecolor='k', cmap=cmap_with_grey,
legend=True, legend_kwds=dict(loc='center left'),
ax=ax)
# get all upper bounds in the quantiles category
upper_bounds = quantiles.bins
# get and format all bounds
bounds = []
for index, upper_bound in enumerate(upper_bounds):
if index == 0:
lower_bound = gdf.random.min()
else:
lower_bound = upper_bounds[index-1]
bound = f'{lower_bound:.2f} - {upper_bound:.2f}'
bounds.append(bound)
# get all the legend labels
legend_labels = ax.get_legend().get_texts()
# replace the numerical legend labels
for bound, legend_label in zip(bounds, legend_labels):
legend_label.set_text(bound)
You may want to take a look at the following posts:
format/round numerical legend label in GeoPandas
Extract matplotlib colormap in hex-format
Matplotlib.colors.ListedColormap in python
Change main plot legend label text
Update as of geopandas 0.8.1
:
You can now simply pass a missing_kwds
arg in the plot function:
fig, ax = plt.subplots(figsize=(12, 10))
missing_kwds = dict(color='grey', label='No Data')
gdf.plot(column='random', scheme='Quantiles', k= 5,
legend=True, legend_kwds=dict(loc='center left'),
missing_kwds=missing_kwds, ax=ax)
Upvotes: 8
Reputation: 4030
Update: New feature in geopandas
solves your problem: You can leave your missing values as NaN
and use:
ax = gdf.plot( <other arguments>,
missing_kwds = dict(color = "lightgrey",) )
To make all missing data regions light grey.
See https://geopandas.readthedocs.io/en/latest/mapping.html
(actually, the documentation may say that the parameter is missing_kwdsdict
, but the above is what works for me)
Upvotes: 2
Reputation: 7814
GeoPandas does not support plotting missing values at this moment. This is planned for 0.7 release. Possible solution is to plot only those rows without missing values and then plot only missing values. As you did not give us any code, below is an example from https://nbviewer.jupyter.org/gist/jorisvandenbossche/bb1cc71f94aa3e8f2832f18dd12f6174
import geopandas
gdf = geopandas.read_file(geopandas.datasets.get_path('naturalearth_lowres'))
# Introduce some missing values:
gdf.loc[np.random.choice(gdf.index, 20), 'pop_est'] = np.nan
ax = gdf[gdf.pop_est.notna()].plot(column='pop_est', figsize=(15, 10), legend=True)
gdf[gdf.pop_est.isna()].plot(color='lightgrey', hatch='///', ax=ax)
Upvotes: 1