Gabriel Silva
Gabriel Silva

Reputation: 37

How to groupby and plot the aggregated values

This is the dataframe that I'm using: https://www.kaggle.com/spscientist/students-performance-in-exams

It contains the following columns (That I want to use):

  1. Race/Ethnicity (String: GROUP A, GROUP B, GROUP C...)
  2. Math Score (Numeric: 60, 70, 80...)
  3. Reading Score (Numeric: 60, 70, 80...)
  4. Writing Score (Numeric: 60, 70, 80...)

Basically, I want a barplot that takes Race/Ethnicity to X-Axis with each containing 3 Bars, those will be Math Score, Reading Score, and Writing Score. Every Race/Ethnicity must be in the same graph each with its own Scores.

It should somewhat look like this: enter image description here

I used the group_by function to get the mean for each lesson in each group:

df3 = df.groupby('race/ethnicity')[['math score', 'writing score', 'reading score']].mean().reset_index()

And it gave me this (I translated "group" to "grupo"):

enter image description here

So now all I need to do is find a way to make the graphic I told you, but that's what I can't figure out. Got any ideas?

I would prefer to use seaborn, but if you want to use matplotlib instead there's no problem.

Upvotes: 1

Views: 1228

Answers (2)

Trenton McKinney
Trenton McKinney

Reputation: 62373

  • Addressing I would prefer to use seaborn: seaborn is just a high-level API for matplotlib.
  • There are two easy ways to generate the desired grouped plot
    1. Groupby and plot the grouped dataframe
      • The OP already has grouped the dataframe, but should not have reset the index
    2. Stack the dataframe from a wide to a long format and use seaborn.barplot

Imports and Create DataFrame

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# load the Kaggle dataset
df = pd.read_csv('StudentsPerformance.csv')

Groupby Plot

  • For the example in the OP, this is the simplest implementation
# groupby without resetting the dataframe
df_groupby = df.groupby('race/ethnicity')[['math score', 'writing score', 'reading score']].mean()

# plot df_groupby
ax = df_groupby.plot.bar(figsize=(8, 6), width=0.7)
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.xticks(rotation=0)
plt.show()

Stack and use seaborn.barplot

  • This implementation requires converting the dataframe from a wide to long format, using .stack().
  • The default estimator for seaborn.barplot is mean.
# stack the dataframe into a long format
df_long = df.set_index(keys=df.columns[:5].tolist()).stack().reset_index().rename(columns={'level_5': 'subject', 0: 'score'}).sort_values('race/ethnicity')

# plot with seaborn
fig, ax = plt.subplots(figsize=(8, 7))
sns.barplot(data=df_long, x='race/ethnicity', y='score', hue='subject', ci=False, ax=ax)
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

Plot Result of Both

enter image description here

Reproducible DataFrame

  • Use this if the file is no longer available
import pandas as pd

data = {'gender': ['female', 'female', 'female', 'male', 'male', 'female', 'female', 'male', 'male', 'female', 'male', 'male', 'female', 'male', 'female', 'female', 'male', 'female', 'male', 'female', 'male', 'female', 'male', 'female', 'male', 'male', 'male', 'female', 'male', 'female', 'female', 'female', 'female', 'male', 'male', 'male', 'female', 'female', 'female', 'male', 'male', 'female', 'female', 'male', 'female', 'male', 'female', 'female', 'female', 'male', 'male', 'male', 'male', 'male', 'female', 'female', 'female', 'male', 'male', 'female', 'male', 'male', 'male', 'female', 'female', 'male', 'male', 'female', 'male', 'female', 'female', 'male', 'female', 'male', 'male', 'male', 'male', 'male', 'female', 'female', 'female', 'male', 'male', 'male', 'male', 'female', 'female', 'female', 'female', 'female', 'female', 'male', 'male', 'male', 'female', 'male', 'male', 'female', 'female', 'female'],
        'race/ethnicity': ['group B', 'group C', 'group B', 'group A', 'group C', 'group B', 'group B', 'group B', 'group D', 'group B', 'group C', 'group D', 'group B', 'group A', 'group A', 'group C', 'group C', 'group B', 'group C', 'group C', 'group D', 'group B', 'group D', 'group C', 'group D', 'group A', 'group B', 'group C', 'group C', 'group D', 'group D', 'group B', 'group E', 'group D', 'group E', 'group E', 'group D', 'group D', 'group D', 'group B', 'group C', 'group C', 'group B', 'group B', 'group E', 'group B', 'group A', 'group C', 'group D', 'group C', 'group E', 'group E', 'group C', 'group D', 'group C', 'group C', 'group E', 'group D', 'group D', 'group C', 'group E', 'group A', 'group A', 'group C', 'group D', 'group B', 'group D', 'group C', 'group B', 'group C', 'group D', 'group D', 'group A', 'group C', 'group C', 'group B', 'group E', 'group A', 'group D', 'group E', 'group B', 'group B', 'group A', 'group E', 'group D', 'group C', 'group C', 'group D', 'group A', 'group D', 'group C', 'group C', 'group C', 'group C', 'group B', 'group C', 'group B', 'group E', 'group D', 'group D'],
        'parental level of education': ["bachelor's degree", 'some college', "master's degree", "associate's degree", 'some college', "associate's degree", 'some college', 'some college', 'high school', 'high school', "associate's degree", "associate's degree", 'high school', 'some college', "master's degree", 'some high school', 'high school', 'some high school', "master's degree", "associate's degree", 'high school', 'some college', 'some college', 'some high school', "bachelor's degree", "master's degree", 'some college', "bachelor's degree", 'high school', "master's degree", 'some college', 'some college', "master's degree", 'some college', 'some college', "associate's degree", "associate's degree", 'some high school', "associate's degree", "associate's degree", "associate's degree", "associate's degree", "associate's degree", 'some college', "associate's degree", "associate's degree", "associate's degree", 'high school', "associate's degree", 'high school', 'some college', "associate's degree", 'some college', 'high school', 'some high school', 'high school', "associate's degree", "associate's degree", 'some college', 'some high school', "bachelor's degree", 'some high school', "associate's degree", "associate's degree", 'some high school', 'some high school', 'some high school', 'some college', "associate's degree", "associate's degree", 'some college', 'some college', "associate's degree", 'some high school', 'some high school', "associate's degree", 'some high school', "bachelor's degree", 'some high school', "master's degree", "associate's degree", 'high school', 'some college', "associate's degree", 'high school', 'some college', 'some college', "associate's degree", 'some college', 'some high school', "bachelor's degree", 'high school', 'high school', "associate's degree", 'some college', "associate's degree", 'some high school', 'some college', 'some college', "bachelor's degree"], 'lunch': ['standard', 'standard', 'standard', 'free/reduced', 'standard', 'standard', 'standard', 'free/reduced', 'free/reduced', 'free/reduced', 'standard', 'standard', 'standard', 'standard', 'standard', 'standard', 'standard', 'free/reduced', 'free/reduced', 'free/reduced', 'standard', 'free/reduced', 'standard', 'standard', 'free/reduced', 'free/reduced', 'standard', 'standard', 'standard', 'standard', 'standard', 'standard', 'free/reduced', 'standard', 'standard', 'standard', 'standard', 'free/reduced', 'free/reduced', 'free/reduced', 'free/reduced', 'standard', 'standard', 'free/reduced', 'free/reduced', 'standard', 'standard', 'standard', 'free/reduced', 'standard', 'standard', 'free/reduced', 'standard', 'standard', 'free/reduced', 'free/reduced', 'standard', 'standard', 'standard', 'free/reduced', 'free/reduced', 'free/reduced', 'free/reduced', 'standard', 'standard', 'standard', 'free/reduced', 'standard', 'free/reduced', 'standard', 'free/reduced', 'standard', 'free/reduced', 'free/reduced', 'standard', 'free/reduced', 'standard', 'standard', 'standard', 'standard', 'standard', 'free/reduced', 'free/reduced', 'standard', 'free/reduced', 'standard', 'free/reduced', 'standard', 'standard', 'standard', 'standard', 'free/reduced', 'standard', 'free/reduced', 'standard', 'free/reduced', 'standard', 'standard', 'free/reduced', 'standard'], 'test preparation course': ['none', 'completed', 'none', 'none', 'none', 'none', 'completed', 'none', 'completed', 'none', 'none', 'none', 'none', 'completed', 'none', 'none', 'none', 'none', 'completed', 'none', 'none', 'completed', 'none', 'none', 'completed', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'completed', 'none', 'none', 'completed', 'none', 'none', 'none', 'none', 'completed', 'none', 'none', 'completed', 'none', 'completed', 'completed', 'none', 'completed', 'none', 'none', 'completed', 'none', 'completed', 'none', 'completed', 'none', 'completed', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'completed', 'completed', 'none', 'none', 'none', 'none', 'none', 'completed', 'completed', 'none', 'none', 'none', 'completed', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'none', 'completed', 'none', 'completed', 'completed', 'completed', 'none', 'none'],
        'math score': [72, 69, 90, 47, 76, 71, 88, 40, 64, 38, 58, 40, 65, 78, 50, 69, 88, 18, 46, 54, 66, 65, 44, 69, 74, 73, 69, 67, 70, 62, 69, 63, 56, 40, 97, 81, 74, 50, 75, 57, 55, 58, 53, 59, 50, 65, 55, 66, 57, 82, 53, 77, 53, 88, 71, 33, 82, 52, 58, 0, 79, 39, 62, 69, 59, 67, 45, 60, 61, 39, 58, 63, 41, 61, 49, 44, 30, 80, 61, 62, 47, 49, 50, 72, 42, 73, 76, 71, 58, 73, 65, 27, 71, 43, 79, 78, 65, 63, 58, 65], 'reading score': [72, 90, 95, 57, 78, 83, 95, 43, 64, 60, 54, 52, 81, 72, 53, 75, 89, 32, 42, 58, 69, 75, 54, 73, 71, 74, 54, 69, 70, 70, 74, 65, 72, 42, 87, 81, 81, 64, 90, 56, 61, 73, 58, 65, 56, 54, 65, 71, 74, 84, 55, 69, 44, 78, 84, 41, 85, 55, 59, 17, 74, 39, 61, 80, 58, 64, 37, 72, 58, 64, 63, 55, 51, 57, 49, 41, 26, 78, 74, 68, 49, 45, 47, 64, 39, 80, 83, 71, 70, 86, 72, 34, 79, 45, 86, 81, 66, 72, 67, 67], 'writing score': [74, 88, 93, 44, 75, 78, 92, 39, 67, 50, 52, 43, 73, 70, 58, 78, 86, 28, 46, 61, 63, 70, 53, 73, 80, 72, 55, 75, 65, 75, 74, 61, 65, 38, 82, 79, 83, 59, 88, 57, 54, 68, 65, 66, 54, 57, 62, 76, 76, 82, 48, 68, 42, 75, 87, 43, 86, 49, 58, 10, 72, 34, 55, 71, 59, 61, 37, 74, 56, 57, 73, 63, 48, 56, 41, 38, 22, 81, 72, 68, 50, 45, 54, 63, 34, 82, 88, 74, 67, 82, 74, 36, 71, 50, 92, 82, 62, 70, 62, 62]}
df = pd.DataFrame(data)

Upvotes: 1

ohtotasche
ohtotasche

Reputation: 508

I would reform your data so it works easier with catplot:

df3 = df.groupby('race/ethnicity')[['math score', 'writing score', 'reading score']].mean().reset_index()
df_scores = pd.DataFrame(columns=["Score", "Subject", "Race/Ethnicity"])

for subject in ["math", "reading", "writing"]:
    df_tmp1 = pd.DataFrame({"Score":df3[subject+" score"], "Race/Ethnicity":df3["race/ethnicity"]})
    df_tmp1["Subject"] = subject
    df_scores = df_scores.append(df_tmp1)

This gives you a df_scores table like this:

Score Subject Race/Ethnicity
61.6292 math group A
63.4526 math group B
64.4639 math group C
67.3626 math group D
73.8214 math group E
64.6742 reading group A
67.3526 reading group B
69.1034 reading group C
70.0305 reading group D
73.0286 reading group E
62.6742 writing group A
65.6 writing group B
67.8276 writing group C
70.145 writing group D
71.4071 writing group E

Then plot it:

g = sns.catplot(data=df_scores, kind="bar",
    x="Race/Ethnicity", 
    y="Score",
    hue="Subject")

enter image description here

Upvotes: 0

Related Questions