Diogo Neiss
Diogo Neiss

Reputation: 117

seaborn.FacetGrid on array of arrays

I'm generating an array of arrays in a function, passing to a pandas.DataFrame and I want to run multiple histograms, for each array, to validate the data.

My df is like this, where the columns are arrays and rows the specific indexes. Its built using pd.DataFrame(np.array(nodeRemovals).T) (but I tried without the transpose).

       0     1
0   1257  2112
1   2527  1455
2    298    58
3   1762   155
4   1472  1695
5   1563  1327
6   2018  2589
7   1540  1104
8   1014  1939
9   2662   984
10  2477   364

I thought of defining a custom map function, where one column would be selected, but based on the available documentation of structured multi-plot grids I couldn't figure it out.

Right now I'm doing, but it yields no histogram mosaic, as I wanted.

g = sns.FacetGrid(nodeDf)
g.map(sns.histplot,  kde=True, bins=10, color='blue')
plt.show()

Upvotes: 0

Views: 316

Answers (1)

LudvigH
LudvigH

Reputation: 4753

First, I load the data of yours

import pandas as pd
from io import StringIO
a = """       0     1
0   1257  2112
1   2527  1455
2    298    58
3   1762   155
4   1472  1695
5   1563  1327
6   2018  2589
7   1540  1104
8   1014  1939
9   2662   984
10  2477   364"""
df_wide= pd.read_fwf(StringIO(a),colspecs=[(4,10),(10,16)])

I call this format "wide" because you have one column per source array, potentially being a "wide" dataframe. We will now convert it into a "long" dataframe, where all data is stacked vertically instead. That is the format that seaborn is best designed to work with. Sure, it can do with wide data, but I think you get less problems and misunderstandings in long form. See more in their documentation https://seaborn.pydata.org/tutorial/data_structure.html#long-form-vs-wide-form-data

df_long = df_wide.melt(var_name = 'array',value_name='removals')

Finally, we use seaborn to plot. To make a grid of histograms, you can either go the route of FacetGrid as you did, but there is a convenience function that does exactly that. It is called displot

import seaborn as sns
import matplotlib.pyplot as plt
sns.displot(data=df_long,x='removals',col='array',color='blue',kde=True,bins=10)
plt.show()

The result is like this

Two histograms side by side

Upvotes: 1

Related Questions