RndmSymbl
RndmSymbl

Reputation: 543

Seaborn color bar on FacetGrid for histplot with normalized color mapping

I seem unable to show the color bar for a two dimensional histplot using seaborn FacetGrid. Can someone point me to the missing link please?

Understanding that similar solutions have been discussed I have not been able to adapt to my use case:

  1. Has the right position and values for color bar but isn't working for histplot
  2. This proposal is not running at all & is rather dated so I am not sure it is still supposed to work
  3. Seems to have fixed vmin/vmax and does not work with histplot

Specifically I am looking to extend the code below so that color bar is shown.

import pandas as pd
import numpy as np
import seaborn as sns

df = pd.DataFrame(list(zip([random.randint(0,10) for i in range(1000)], pd.to_datetime(
                            [d.strftime('%Y-%m-%d') for d in pd.date_range('1800-01-01', periods=250, freq='1d')]+\
                            [d.strftime('%Y-%m-%d') for d in pd.date_range('1800-01-01', periods=250, freq='1d')]+\
                            [d.strftime('%Y-%m-%d') for d in pd.date_range('1800-01-01', periods=250, freq='1d')]+\
                            [d.strftime('%Y-%m-%d') for d in pd.date_range('1800-01-01', periods=250, freq='1d')]),
                            [random.choice(string.ascii_letters[26:30]) for i in range(1000)])), 
                            columns=["range","date","case_type"])
df["range"][df["case_type"]=="A"] = [random.randint(4562,873645) for i in range(1000)] 
df["range"][df["case_type"]=="C"] = [random.random() for i in range(1000)] 
fg = sns.FacetGrid(df, col="case_type", col_wrap=2, sharey=False)

fg.map(sns.histplot, "date", "range", stat="count", data=df)
fg.set_xticklabels(rotation=30)
fg.fig.show()

The objective would be to have a color bar on the right side of the facet grid, spanning the entire chart - two rows here but more may be shown. The displayed 2D histogram feature some very different data types so the counts per bin & color are likely very different and it matters to know if "dark blue" is 100 or 1000.

2dhistogram in search of a colorbar

EDIT: For sake of clarity it appears from comments that the problem breaks down into two steps:

  1. How to normalize the color coding among all plots and
  2. Display a color bar on the right side of the plot using the normalized color mapping

Upvotes: 3

Views: 1976

Answers (1)

Mr. T
Mr. T

Reputation: 12410

I am not sure there is a seaborn-inherent way to achieve your desired plot. But we can pre-compute sensible values for bin number and vmin/vmax and apply them to all histplots:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

#generate a test dataset with different case_type probabilities
np.random.seed(123)
p1, p2, p3 = 0.8, 0.1, 0.03
df = pd.DataFrame(list(zip(np.random.randint(0, 20, 1000), 
                  pd.to_datetime(4 * [d.strftime('%Y-%m-%d') for d in pd.date_range('1800-01-01', periods=250, freq='1d')]),
                  np.random.choice(list("ABCD"),size=1000, p=[p1, p2, p3, 1-(p1+p2+p3)]))), 
                  columns=["range","date","case_type"])
df.loc[df.case_type == "A", "range"] *=   3
df.loc[df.case_type == "B", "range"] *=  23
df.loc[df.case_type == "C", "range"] *= 123

#determine the bin number for the x-axis
_, bin_edges = np.histogram(df["date"].dt.strftime("%Y%m%d").astype(int), bins="auto")
bin_nr = len(bin_edges)-1

#predetermine min and max count for each category
c_types = df["case_type"].unique()
vmin_list, vmax_list = [], []
for c_type in c_types:
    arr, _, _ = np.histogram2d(df.loc[df.case_type == c_type, "date"], df.loc[df.case_type == c_type, "range"], bins=bin_nr)
    vmin_list.append(arr.min())
    vmax_list.append(arr.max())
    
#find lowest and highest counts for all subplots
vmin_all = min(vmin_list)
vmax_all = max(vmax_list)

#now we are ready to plot
fg = sns.FacetGrid(df, col="case_type", col_wrap=2, sharey=False)
#create common colorbar axis
cax = fg.fig.add_axes([.92, .12, .02, .8])
#map colorbar to colorbar axis with common vmin/vmax values
fg.map(sns.histplot,"date", "range", stat="count", bins=bin_nr, vmin=vmin_all, vmax=vmax_all, cbar=True, cbar_ax=cax, data=df)
#prevent overlap
fg.fig.subplots_adjust(right=.9)
fg.set_xticklabels(rotation=30)

plt.show()

Sample output: enter image description here

You may also notice that I changed your sample dataframe so that the case_types occur at different frequencies, otherwise you don't see much difference between histplots. You should also be aware that the histplots are plotted in the order they appear in the dataframe, which might not be the order you would like to see in your graph.

Disclaimer: This is largely based on mwaskom's answer.

Upvotes: 1

Related Questions