Reputation: 79
I am trying to plot a countplot with seaborn using multiple datasets which were sliced from a single dataset( "heart.csv" ).
I am getting a perfect countplot using this code below
df = pd.read_csv("heart.csv")
df['Sex'].value_counts()
sns.countplot(data=df, x='Sex')
But, I need to plot a countplot from different datasets or different segments from a single dataset. I am slicing the dataset using this code snippet below.
Clv = df.loc[0:302, :]
Hng = df.loc[303:(303+293), :]
Swtz = df.loc[(303+294):(303+294+122), :]
Lb = df.loc[(303+294+123):(303+294+123+199), :]
Stl = df.loc[(303+294+123+200):, :]
My question is, How do I plot a count subplots from different datasets or from a different segments of a single dataset?
Upvotes: 1
Views: 2217
Reputation: 80339
You could create an extra column, e.g. 'Category'
and assign a value depending on the row index. Then you can use that new column as a differentiator in seaborn. E.g. as hue='Category'
or x='Category'
:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(2021)
df = pd.DataFrame({'Sex': np.random.choice(['F', 'M'], 1000),
'Other_info': np.random.rand(1000)})
df['Category'] = ''
Clv = df.loc[0:302, 'Category'] = 'Clv'
Hng = df.loc[303:(303 + 293), 'Category'] = 'Hng'
Swtz = df.loc[(303 + 294):(303 + 294 + 122), 'Category'] = 'Swtz'
Lb = df.loc[(303 + 294 + 123):(303 + 294 + 123 + 199), 'Category'] = 'Lb'
Stl = df.loc[(303 + 294 + 123 + 200):, 'Category'] = 'Stl'
sns.set()
ax = sns.countplot(data=df, x='Category', hue='Sex', palette='mako')
# an alternative could be x='Sex', hue='Category'
plt.show()
A sns.catplot()
using the new column as col=
could look like:
sns.set()
g = sns.catplot(data=df, x='Sex', col='Category', sharey=True, height=4, aspect=0.5, palette='rocket', kind='count')
g.set(xlabel='')
plt.tight_layout()
plt.show()
Upvotes: 1
Reputation: 102862
You should put your resulting dataframes in a list and use a for
loop to iterate over them one by one. Putting the relevant code in a function also helps you to not repeat yourself.
def plot_counts(dataframe):
dataframe['Sex'].value_counts()
sns.countplot(data=dataframe, x='Sex')
df = pd.read_csv("heart.csv")
plot_counts(df)
Clv = df.loc[0:302, :]
Hng = df.loc[303:(303+293), :]
Swtz = df.loc[(303+294):(303+294+122), :]
Lb = df.loc[(303+294+123):(303+294+123+199), :]
Stl = df.loc[(303+294+123+200):, :]
df_list = [Clv, Hng, Swtz, Lb, Stl]
for dataframe in df_list:
plot_counts(dataframe)
If you only need to make the 'Sex'
column once, you can eliminate the function definition and just call sns.countplot()
directly:
df = pd.read_csv("heart.csv")
df['Sex'].value_counts()
sns.countplot(data=df, x='Sex')
Clv = df.loc[0:302, :]
Hng = df.loc[303:(303+293), :]
Swtz = df.loc[(303+294):(303+294+122), :]
Lb = df.loc[(303+294+123):(303+294+123+199), :]
Stl = df.loc[(303+294+123+200):, :]
df_list = [Clv, Hng, Swtz, Lb, Stl]
for dataframe in df_list:
sns.countplot(data=dataframe, x='Sex')
Upvotes: 0