TjS
TjS

Reputation: 327

Filter results after using group-by multiple columns

I have a dataframe that looks like:

Type     Months     Year    Marks
 a        1          2018     64
 a        1          2017     69
 a        1          2016     69
 b        1          2018     78
 b        1          2017     71
 b        1          2016     91
 c        1          2018     58
 c        1          2017     65
 c        1          2016     83

 a        2          2018     58
 a        2          2017     65
 a        2          2016     83
 a        3          2018     58
 a        3          2017     65
 a        3          2016     83

I want to group by 'Type' , 'Months' , 'Year' and sum up the marks.

sumOfTypes = data.groupby(['Type','Months','Year'])['Marks'].sum()

The results looks like:

(a, 1, 2018)                  60
(a, 1, 2017)                  54
 .
 .
(c, 1, 2016)                  86           
(c, 2, 2018)                  89

However, I want the to filter the data to for Type 'a' and 'b' only.

Also, I want to vut (c, 2, 2018) to be in different dataframes columns so the result would look like:

df_grouped:

   Type     Months     Year    Marks    
    c         2        2018     89

My current code:

sumOfTypes = data.groupby(['Type','Months','Year'])['Marks'].sum()
df_grouped = pd.DataFrame(sumOfTypes)

Upvotes: 1

Views: 236

Answers (1)

jezrael
jezrael

Reputation: 863236

Add as_index=False or reset_index for return DataFrame and then filter by boolean indexing with isin:

sumOfTypes = data.groupby(['Type','Months','Year'], as_index=False)['Marks'].sum()
#alternative
#sumOfTypes = data.groupby(['Type','Months','Year'])['Marks'].sum().reset_index()

df = sumOfTypes[sumOfTypes['Type'].isin(['a','b'])]

Or filter by boolean indexing with MultiIndex Series, for select level use get_level_values:

sumOfTypes = data.groupby(['Type','Months','Year'])['Marks'].sum()

s = sumOfTypes[sumOfTypes.index.get_level_values('Type').isin(['a','b'])]

Upvotes: 1

Related Questions