linello
linello

Reputation: 8694

Pandas+seaborn faceting with multidimensional dataframes

In Python pandas, I need to do a facet grid from a multidimensional DataFrame. In columns a and b I hold scalar values, which represent conditions of an experiment. In columns x and y instead I have two numpy arrays. Column x is the x-axis of the data and column y is the value of a function corresponding to f(x). Obviously both x and y have the same number of elements.

I now would like to do a facet grid with rows and columns specifying the conditions, and in every cell of the grid, plot the value of column D vs column D.

This could be a minimal working example:

import pandas as pd
d = [0]*4 # initialize a list with 4 elements
d[0] = {'x':[1,2,3],'y':[4,5,6],'a':1,'b':2} # then fill these elements
d[1] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':3}
d[2] = {'x':[3,1,5],'y':[6,5,1],'a':1,'b':3}
d[3] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':2}
pd.DataFrame(d) # create the pandas dataframe

How can I use already existing faceting functions to address the issue of plotting y vs x grouped by the conditions a and b?

Since I need to apply this function to general datasets with different column names, I would like to avoid resorting on hard-coded solutions, but rather see whether it is possible to extend seaborn FacetGrid function to this kind of problem.

Upvotes: 1

Views: 1106

Answers (3)

linello
linello

Reputation: 8694

I believe the best, shortest and most comprehensible solution is to define an appositely created lambda function. It has as input the mapping variables specified by the FacetGrid.map method, and takes its values in form of numpy arrays by the .values[0], as they are unique.

import pandas as pd
d = [0]*4 # initialize a list with 4 elements
d[0] = {'x':[1,2,3],'y':[4,5,6],'a':1,'b':2} # then fill these elements
d[1] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':3}
d[2] = {'x':[3,1,5],'y':[6,5,1],'a':1,'b':3}
d[3] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':2}
df = pd.DataFrame(d) # create the pandas dataframe

import seaborn as sns
import matplotlib.pyplot as plt
grid = sns.FacetGrid(df,row='a',col='b')
grid.map(lambda _x,_y,**kwargs : plt.scatter(_x.values[0],_y.values[0]),'x','y')

seaborn faceting with lambda functions

Upvotes: 0

onno
onno

Reputation: 979

I think the best way to go is to split the nested arrays first and then create a facet grid with seaborn.

Thanks to this post (Split nested array values from Pandas Dataframe cell over multiple rows) I was able to split the nested array in your dataframe:

unnested_lst = []
for col in df.columns:
    unnested_lst.append(df[col].apply(pd.Series).stack())
result = pd.concat(unnested_lst, axis=1, keys=df.columns).fillna(method='ffill')

Then you can make the facet grid with this code:

import seaborn as sbn
fg = sbn.FacetGrid(result, row='b', col='a')
fg.map(plt.scatter, "x", "y", color='blue')

Upvotes: 2

asongtoruin
asongtoruin

Reputation: 10359

You need a long-form frame to be able to use FacetGrid, so your best bet is to explode the lists, then recombine and apply:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

d = [0]*4
d[0] = {'x':[1,2,3],'y':[4,5,6],'a':1,'b':2} # then fill these elements
d[1] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':3}
d[2] = {'x':[3,1,5],'y':[6,5,1],'a':1,'b':3}
d[3] = {'x':[3,1,5],'y':[6,5,1],'a':0,'b':2}
df = pd.DataFrame(d)

df.set_index(['a','b'], inplace=True, drop=True)

x_long = pd.melt(df['x'].apply(pd.Series).reset_index(),
                 id_vars=['a', 'b'], value_name='x')

y_long = pd.melt(df['y'].apply(pd.Series).reset_index(),
                 id_vars=['a', 'b'], value_name='y')

long_df = pd.merge(x_long, y_long).drop('variable', axis='columns')

grid = sns.FacetGrid(long_df, row='a', col='b')
grid.map(plt.scatter, 'x', 'y')
plt.show()

This will show you the following: enter image description here

Upvotes: 1

Related Questions