Reputation: 37
This is the dataframe that I'm using: https://www.kaggle.com/spscientist/students-performance-in-exams
It contains the following columns (That I want to use):
Basically, I want a barplot that takes Race/Ethnicity to X-Axis with each containing 3 Bars, those will be Math Score, Reading Score, and Writing Score. Every Race/Ethnicity must be in the same graph each with its own Scores.
It should somewhat look like this:
I used the group_by function to get the mean for each lesson in each group:
df3 = df.groupby('race/ethnicity')[['math score', 'writing score', 'reading score']].mean().reset_index()
And it gave me this (I translated "group" to "grupo"):
So now all I need to do is find a way to make the graphic I told you, but that's what I can't figure out. Got any ideas?
I would prefer to use seaborn, but if you want to use matplotlib instead there's no problem.
Upvotes: 1
Views: 1228
Reputation: 62373
seaborn
is just a high-level API for matplotlib
.seaborn.barplot
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# load the Kaggle dataset
df = pd.read_csv('StudentsPerformance.csv')
# groupby without resetting the dataframe
df_groupby = df.groupby('race/ethnicity')[['math score', 'writing score', 'reading score']].mean()
# plot df_groupby
ax = df_groupby.plot.bar(figsize=(8, 6), width=0.7)
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(rotation=0)
plt.show()
seaborn.barplot
.stack()
.estimator
for seaborn.barplot
is mean
.# stack the dataframe into a long format
df_long = df.set_index(keys=df.columns[:5].tolist()).stack().reset_index().rename(columns={'level_5': 'subject', 0: 'score'}).sort_values('race/ethnicity')
# plot with seaborn
fig, ax = plt.subplots(figsize=(8, 7))
sns.barplot(data=df_long, x='race/ethnicity', y='score', hue='subject', ci=False, ax=ax)
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
import pandas as pd
data = {'gender': ['female', 'female', 'female', 'male', 'male', 'female', 'female', 'male', 'male', 'female', 'male', 'male', 'female', 'male', 'female', 'female', 'male', 'female', 'male', 'female', 'male', 'female', 'male', 'female', 'male', 'male', 'male', 'female', 'male', 'female', 'female', 'female', 'female', 'male', 'male', 'male', 'female', 'female', 'female', 'male', 'male', 'female', 'female', 'male', 'female', 'male', 'female', 'female', 'female', 'male', 'male', 'male', 'male', 'male', 'female', 'female', 'female', 'male', 'male', 'female', 'male', 'male', 'male', 'female', 'female', 'male', 'male', 'female', 'male', 'female', 'female', 'male', 'female', 'male', 'male', 'male', 'male', 'male', 'female', 'female', 'female', 'male', 'male', 'male', 'male', 'female', 'female', 'female', 'female', 'female', 'female', 'male', 'male', 'male', 'female', 'male', 'male', 'female', 'female', 'female'],
'race/ethnicity': ['group B', 'group C', 'group B', 'group A', 'group C', 'group B', 'group B', 'group B', 'group D', 'group B', 'group C', 'group D', 'group B', 'group A', 'group A', 'group C', 'group C', 'group B', 'group C', 'group C', 'group D', 'group B', 'group D', 'group C', 'group D', 'group A', 'group B', 'group C', 'group C', 'group D', 'group D', 'group B', 'group E', 'group D', 'group E', 'group E', 'group D', 'group D', 'group D', 'group B', 'group C', 'group C', 'group B', 'group B', 'group E', 'group B', 'group A', 'group C', 'group D', 'group C', 'group E', 'group E', 'group C', 'group D', 'group C', 'group C', 'group E', 'group D', 'group D', 'group C', 'group E', 'group A', 'group A', 'group C', 'group D', 'group B', 'group D', 'group C', 'group B', 'group C', 'group D', 'group D', 'group A', 'group C', 'group C', 'group B', 'group E', 'group A', 'group D', 'group E', 'group B', 'group B', 'group A', 'group E', 'group D', 'group C', 'group C', 'group D', 'group A', 'group D', 'group C', 'group C', 'group C', 'group C', 'group B', 'group C', 'group B', 'group E', 'group D', 'group D'],
'parental level of education': ["bachelor's degree", 'some college', "master's degree", "associate's degree", 'some college', "associate's degree", 'some college', 'some college', 'high school', 'high school', "associate's degree", "associate's degree", 'high school', 'some college', "master's degree", 'some high school', 'high school', 'some high school', "master's degree", "associate's degree", 'high school', 'some college', 'some college', 'some high school', "bachelor's degree", "master's degree", 'some college', "bachelor's degree", 'high school', "master's degree", 'some college', 'some college', "master's degree", 'some college', 'some college', "associate's degree", "associate's degree", 'some high school', "associate's degree", "associate's degree", "associate's degree", "associate's degree", "associate's degree", 'some college', "associate's degree", "associate's degree", "associate's degree", 'high school', "associate's degree", 'high school', 'some college', "associate's degree", 'some college', 'high school', 'some high school', 'high school', "associate's degree", "associate's degree", 'some college', 'some high school', "bachelor's degree", 'some high school', "associate's degree", "associate's degree", 'some high school', 'some high school', 'some high school', 'some college', "associate's degree", "associate's degree", 'some college', 'some college', "associate's degree", 'some high school', 'some high school', "associate's degree", 'some high school', "bachelor's degree", 'some high school', "master's degree", "associate's degree", 'high school', 'some college', "associate's degree", 'high school', 'some college', 'some college', "associate's degree", 'some college', 'some high school', "bachelor's degree", 'high school', 'high school', "associate's degree", 'some college', "associate's degree", 'some high school', 'some college', 'some college', "bachelor's degree"], 'lunch': ['standard', 'standard', 'standard', 'free/reduced', 'standard', 'standard', 'standard', 'free/reduced', 'free/reduced', 'free/reduced', 'standard', 'standard', 'standard', 'standard', 'standard', 'standard', 'standard', 'free/reduced', 'free/reduced', 'free/reduced', 'standard', 'free/reduced', 'standard', 'standard', 'free/reduced', 'free/reduced', 'standard', 'standard', 'standard', 'standard', 'standard', 'standard', 'free/reduced', 'standard', 'standard', 'standard', 'standard', 'free/reduced', 'free/reduced', 'free/reduced', 'free/reduced', 'standard', 'standard', 'free/reduced', 'free/reduced', 'standard', 'standard', 'standard', 'free/reduced', 'standard', 'standard', 'free/reduced', 'standard', 'standard', 'free/reduced', 'free/reduced', 'standard', 'standard', 'standard', 'free/reduced', 'free/reduced', 'free/reduced', 'free/reduced', 'standard', 'standard', 'standard', 'free/reduced', 'standard', 'free/reduced', 'standard', 'free/reduced', 'standard', 'free/reduced', 'free/reduced', 'standard', 'free/reduced', 'standard', 'standard', 'standard', 'standard', 'standard', 'free/reduced', 'free/reduced', 'standard', 'free/reduced', 'standard', 'free/reduced', 'standard', 'standard', 'standard', 'standard', 'free/reduced', 'standard', 'free/reduced', 'standard', 'free/reduced', 'standard', 'standard', 'free/reduced', 'standard'], 'test preparation course': ['none', 'completed', 'none', 'none', 'none', 'none', 'completed', 'none', 'completed', 'none', 'none', 'none', 'none', 'completed', 'none', 'none', 'none', 'none', 'completed', 'none', 'none', 'completed', 'none', 'none', 'completed', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'completed', 'none', 'none', 'completed', 'none', 'none', 'none', 'none', 'completed', 'none', 'none', 'completed', 'none', 'completed', 'completed', 'none', 'completed', 'none', 'none', 'completed', 'none', 'completed', 'none', 'completed', 'none', 'completed', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'completed', 'completed', 'none', 'none', 'none', 'none', 'none', 'completed', 'completed', 'none', 'none', 'none', 'completed', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'completed', 'none', 'completed', 'completed', 'completed', 'none', 'none'],
'math score': [72, 69, 90, 47, 76, 71, 88, 40, 64, 38, 58, 40, 65, 78, 50, 69, 88, 18, 46, 54, 66, 65, 44, 69, 74, 73, 69, 67, 70, 62, 69, 63, 56, 40, 97, 81, 74, 50, 75, 57, 55, 58, 53, 59, 50, 65, 55, 66, 57, 82, 53, 77, 53, 88, 71, 33, 82, 52, 58, 0, 79, 39, 62, 69, 59, 67, 45, 60, 61, 39, 58, 63, 41, 61, 49, 44, 30, 80, 61, 62, 47, 49, 50, 72, 42, 73, 76, 71, 58, 73, 65, 27, 71, 43, 79, 78, 65, 63, 58, 65], 'reading score': [72, 90, 95, 57, 78, 83, 95, 43, 64, 60, 54, 52, 81, 72, 53, 75, 89, 32, 42, 58, 69, 75, 54, 73, 71, 74, 54, 69, 70, 70, 74, 65, 72, 42, 87, 81, 81, 64, 90, 56, 61, 73, 58, 65, 56, 54, 65, 71, 74, 84, 55, 69, 44, 78, 84, 41, 85, 55, 59, 17, 74, 39, 61, 80, 58, 64, 37, 72, 58, 64, 63, 55, 51, 57, 49, 41, 26, 78, 74, 68, 49, 45, 47, 64, 39, 80, 83, 71, 70, 86, 72, 34, 79, 45, 86, 81, 66, 72, 67, 67], 'writing score': [74, 88, 93, 44, 75, 78, 92, 39, 67, 50, 52, 43, 73, 70, 58, 78, 86, 28, 46, 61, 63, 70, 53, 73, 80, 72, 55, 75, 65, 75, 74, 61, 65, 38, 82, 79, 83, 59, 88, 57, 54, 68, 65, 66, 54, 57, 62, 76, 76, 82, 48, 68, 42, 75, 87, 43, 86, 49, 58, 10, 72, 34, 55, 71, 59, 61, 37, 74, 56, 57, 73, 63, 48, 56, 41, 38, 22, 81, 72, 68, 50, 45, 54, 63, 34, 82, 88, 74, 67, 82, 74, 36, 71, 50, 92, 82, 62, 70, 62, 62]}
df = pd.DataFrame(data)
Upvotes: 1
Reputation: 508
I would reform your data so it works easier with catplot
:
df3 = df.groupby('race/ethnicity')[['math score', 'writing score', 'reading score']].mean().reset_index()
df_scores = pd.DataFrame(columns=["Score", "Subject", "Race/Ethnicity"])
for subject in ["math", "reading", "writing"]:
df_tmp1 = pd.DataFrame({"Score":df3[subject+" score"], "Race/Ethnicity":df3["race/ethnicity"]})
df_tmp1["Subject"] = subject
df_scores = df_scores.append(df_tmp1)
This gives you a df_scores
table like this:
Score | Subject | Race/Ethnicity |
---|---|---|
61.6292 | math | group A |
63.4526 | math | group B |
64.4639 | math | group C |
67.3626 | math | group D |
73.8214 | math | group E |
64.6742 | reading | group A |
67.3526 | reading | group B |
69.1034 | reading | group C |
70.0305 | reading | group D |
73.0286 | reading | group E |
62.6742 | writing | group A |
65.6 | writing | group B |
67.8276 | writing | group C |
70.145 | writing | group D |
71.4071 | writing | group E |
Then plot it:
g = sns.catplot(data=df_scores, kind="bar",
x="Race/Ethnicity",
y="Score",
hue="Subject")
Upvotes: 0