Md. Rezuwan Hassan
Md. Rezuwan Hassan

Reputation: 79

How do I plot countplot using the same column from multiple similar datasets?

I am trying to plot a countplot with seaborn using multiple datasets which were sliced from a single dataset( "heart.csv" ).

I am getting a perfect countplot using this code below

df = pd.read_csv("heart.csv")
df['Sex'].value_counts()
sns.countplot(data=df, x='Sex')

But, I need to plot a countplot from different datasets or different segments from a single dataset. I am slicing the dataset using this code snippet below.

Clv = df.loc[0:302, :]
Hng = df.loc[303:(303+293), :]
Swtz = df.loc[(303+294):(303+294+122), :]
Lb = df.loc[(303+294+123):(303+294+123+199), :]
Stl = df.loc[(303+294+123+200):, :]

My question is, How do I plot a count subplots from different datasets or from a different segments of a single dataset?

Upvotes: 1

Views: 2217

Answers (2)

JohanC
JohanC

Reputation: 80339

You could create an extra column, e.g. 'Category' and assign a value depending on the row index. Then you can use that new column as a differentiator in seaborn. E.g. as hue='Category' or x='Category':

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

np.random.seed(2021)
df = pd.DataFrame({'Sex': np.random.choice(['F', 'M'], 1000),
                   'Other_info': np.random.rand(1000)})
df['Category'] = ''
Clv = df.loc[0:302, 'Category'] = 'Clv'
Hng = df.loc[303:(303 + 293), 'Category'] = 'Hng'
Swtz = df.loc[(303 + 294):(303 + 294 + 122), 'Category'] = 'Swtz'
Lb = df.loc[(303 + 294 + 123):(303 + 294 + 123 + 199), 'Category'] = 'Lb'
Stl = df.loc[(303 + 294 + 123 + 200):, 'Category'] = 'Stl'

sns.set()
ax = sns.countplot(data=df, x='Category', hue='Sex', palette='mako')
# an alternative could be x='Sex', hue='Category'
plt.show()

sns.countplot using new column

A sns.catplot() using the new column as col= could look like:

sns.set()
g = sns.catplot(data=df, x='Sex', col='Category', sharey=True, height=4, aspect=0.5, palette='rocket', kind='count')
g.set(xlabel='')
plt.tight_layout()
plt.show()

sns.catplot using new column as col=

Upvotes: 1

MattDMo
MattDMo

Reputation: 102862

You should put your resulting dataframes in a list and use a for loop to iterate over them one by one. Putting the relevant code in a function also helps you to not repeat yourself.

def plot_counts(dataframe):
    dataframe['Sex'].value_counts()
    sns.countplot(data=dataframe, x='Sex')

df = pd.read_csv("heart.csv")
plot_counts(df)

Clv = df.loc[0:302, :]
Hng = df.loc[303:(303+293), :]
Swtz = df.loc[(303+294):(303+294+122), :]
Lb = df.loc[(303+294+123):(303+294+123+199), :]
Stl = df.loc[(303+294+123+200):, :]
df_list = [Clv, Hng, Swtz, Lb, Stl]

for dataframe in df_list:
    plot_counts(dataframe)

If you only need to make the 'Sex' column once, you can eliminate the function definition and just call sns.countplot() directly:

df = pd.read_csv("heart.csv")
df['Sex'].value_counts()
sns.countplot(data=df, x='Sex')

Clv = df.loc[0:302, :]
Hng = df.loc[303:(303+293), :]
Swtz = df.loc[(303+294):(303+294+122), :]
Lb = df.loc[(303+294+123):(303+294+123+199), :]
Stl = df.loc[(303+294+123+200):, :]
df_list = [Clv, Hng, Swtz, Lb, Stl]

for dataframe in df_list:
    sns.countplot(data=dataframe, x='Sex')

Upvotes: 0

Related Questions