Zapnuk
Zapnuk

Reputation: 645

Categorical plot of with data of multiple columns and their mean

I'd like to create a categorical plot of two pandas DataFrame columns a and b in the same figure with shared x and different y axis:

import pandas as pd
import seaborn as sns

example = [
    ('exp1','f0', 0.25, 2),
    ('exp1','f1', 0.5, 3),
    ('exp1','f2', 0.75, 4),
    ('exp2','f1', -0.25, 1),
    ('exp2','f2', 1, 2),
    ('exp2','f3', 0, 3)
]
df = pd.DataFrame(example, columns=['exp', 'split', 'a', 'b'])
mean_df = df.groupby('exp')['a'].mean()
g = sns.catplot(x='exp', y='a', data=df, jitter=False)
ax2 = plt.twinx()
sns.catplot(x='exp', y='b', data=df, jitter=False, ax=ax2)

In this implementation I have the problem that the colors are different for categories (x-values), not for the columns. Can I sole this or do I have to change the data structure?

I would also like to connect the means of the categorical values like in the image like this: image

Upvotes: 0

Views: 554

Answers (2)

Diziet Asahi
Diziet Asahi

Reputation: 40667

df = pd.DataFrame(example, columns=['exp', 'split', 'a', 'b'])
mean_df = df.groupby('exp').mean().reset_index()

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

sns.scatterplot(x='exp', y='a', data=df, color='C0', ax=ax1)
sns.scatterplot(x='exp', y='b', data=df, color='C1', ax=ax2)

sns.lineplot(x='exp',y='a', data=mean_df, color='C0', ax=ax1)
sns.lineplot(x='exp',y='b', data=mean_df, color='C1', ax=ax2)

enter image description here

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

You may want to melt your data first:

data = df.melt(id_vars='exp', value_vars=['a','b'])

fig, ax = plt.subplots()
sns.scatterplot(data=data,
                x='exp',
                hue='variable',
                y='value',
                ax=ax)

(data.groupby(['exp','variable'])['value']
     .mean()
     .unstack('variable')
     .plot(ax=ax, legend=False)
)
ax.set_xlim(-0.5, 1.5);

Output:

enter image description here

Upvotes: 1

Related Questions