Robert Link
Robert Link

Reputation: 367

Combine data from multiple DataFrames into single plot w/o combining DataFrames

Due to memory constraints, I cannot have all data loaded into my jupyter notebook at once to generate a desired plot. The solution I came up with is to incrementally load my data and update the plot with data from each DataFrame. Here is a dummy example of what I tried:

import pandas as pd
import seaborn as sbn

df1 = pd.DataFrame({100 : [1,2,3], 200: [2,3,4]})
df2 = pd.DataFrame({300 : [4,5,6], 400: [5,6,7]})
df3 = pd.DataFrame({500 : [11,12,13,14], 600: [12,13,14,15]})

ax = plt.subplot(111)

sbn.boxplot(x = 'variable', y = 'value', color = 'yellow',
            data = pd.melt(df1), ax = ax)

sbn.boxplot(x = 'variable', y = 'value', color = 'yellow',
            data = pd.melt(df2), ax = ax)

sbn.boxplot(x = 'variable', y = 'value', color = 'yellow', 
            data = pd.melt(df3), ax = ax)

which results in this plot: enter image description here

I would like the end product to look like this:

enter image description here

Any insight would be very welcomed. Thank you!

Upvotes: 0

Views: 47

Answers (2)

mwaskom
mwaskom

Reputation: 49002

Define order = [100, 200, 300, 400, 500, 600] and pass it in each call to boxplot.

Upvotes: 1

JohanC
JohanC

Reputation: 80299

You can concatenate the dataframes before calling melt:

import pandas as pd
import seaborn as sns

df1 = pd.DataFrame({100 : [1,2,3], 200: [2,3,4]})
df2 = pd.DataFrame({300 : [4,5,6], 400: [5,6,7]})
df3 = pd.DataFrame({500 : [11,12,13,14], 600: [12,13,14,15]})

fig, ax = plt.subplots()
sns.boxplot(x = 'variable', y = 'value', color = 'yellow',
            data = pd.concat([df1, df2, df3]).melt(), ax = ax)

concatenating, then melt

Upvotes: 0

Related Questions