Reputation: 8694
In Python pandas
, I need to do a facet grid from a multidimensional DataFrame
.
In columns a
and b
I hold scalar values, which represent conditions of an experiment.
In columns x
and y
instead I have two numpy arrays. Column x
is the x-axis of the data and column y
is the value of a function corresponding to f(x)
.
Obviously both x
and y
have the same number of elements.
I now would like to do a facet grid with rows and columns specifying the conditions, and in every cell of the grid, plot the value of column D vs column D.
This could be a minimal working example:
import pandas as pd
d = [0]*4 # initialize a list with 4 elements
d[0] = {'x':[1,2,3],'y':[4,5,6],'a':1,'b':2} # then fill these elements
d[1] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':3}
d[2] = {'x':[3,1,5],'y':[6,5,1],'a':1,'b':3}
d[3] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':2}
pd.DataFrame(d) # create the pandas dataframe
How can I use already existing faceting functions to address the issue of plotting y vs x
grouped by the conditions a
and b
?
Since I need to apply this function to general datasets with different column names, I would like to avoid resorting on hard-coded solutions, but rather see whether it is possible to extend seaborn FacetGrid
function to this kind of problem.
Upvotes: 1
Views: 1106
Reputation: 8694
I believe the best, shortest and most comprehensible solution is to define an appositely created lambda
function. It has as input the mapping variables specified by the FacetGrid.map
method, and takes its values in form of numpy arrays by the .values[0]
, as they are unique.
import pandas as pd
d = [0]*4 # initialize a list with 4 elements
d[0] = {'x':[1,2,3],'y':[4,5,6],'a':1,'b':2} # then fill these elements
d[1] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':3}
d[2] = {'x':[3,1,5],'y':[6,5,1],'a':1,'b':3}
d[3] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':2}
df = pd.DataFrame(d) # create the pandas dataframe
import seaborn as sns
import matplotlib.pyplot as plt
grid = sns.FacetGrid(df,row='a',col='b')
grid.map(lambda _x,_y,**kwargs : plt.scatter(_x.values[0],_y.values[0]),'x','y')
Upvotes: 0
Reputation: 979
I think the best way to go is to split the nested arrays first and then create a facet grid with seaborn.
Thanks to this post (Split nested array values from Pandas Dataframe cell over multiple rows) I was able to split the nested array in your dataframe:
unnested_lst = []
for col in df.columns:
unnested_lst.append(df[col].apply(pd.Series).stack())
result = pd.concat(unnested_lst, axis=1, keys=df.columns).fillna(method='ffill')
Then you can make the facet grid with this code:
import seaborn as sbn
fg = sbn.FacetGrid(result, row='b', col='a')
fg.map(plt.scatter, "x", "y", color='blue')
Upvotes: 2
Reputation: 10359
You need a long-form frame to be able to use FacetGrid
, so your best bet is to explode the lists, then recombine and apply:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
d = [0]*4
d[0] = {'x':[1,2,3],'y':[4,5,6],'a':1,'b':2} # then fill these elements
d[1] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':3}
d[2] = {'x':[3,1,5],'y':[6,5,1],'a':1,'b':3}
d[3] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':2}
df = pd.DataFrame(d)
df.set_index(['a','b'], inplace=True, drop=True)
x_long = pd.melt(df['x'].apply(pd.Series).reset_index(),
id_vars=['a', 'b'], value_name='x')
y_long = pd.melt(df['y'].apply(pd.Series).reset_index(),
id_vars=['a', 'b'], value_name='y')
long_df = pd.merge(x_long, y_long).drop('variable', axis='columns')
grid = sns.FacetGrid(long_df, row='a', col='b')
grid.map(plt.scatter, 'x', 'y')
plt.show()
This will show you the following:
Upvotes: 1