messy748
messy748

Reputation: 327

Python dataframes: issue when attempting to group by multiple columns

I am attempting to group by multiple columns and return the sum of select columns in my dataframe. I was able to do this by only grouping by one column

df_sum = df.iloc[:, 27:].groupby(df['id']).sum().reset_index()

I am successfully grouping by id and summing the values from column 27 to the end of my dataframe. However, when I try to group by multiple columns using

df_sum = df.iloc[:, 27:].groupby(df['id', 'year']).sum().reset_index()

I get an error

KeyError: ('id', 'year')

I have been able to group by multiple columns before by using something like

df.groupby(['id', 'year'])['some col name'].nunique().reset_index()

I tried using this type of format

df.groupby(['id', 'year'])[df.iloc[:,27:]].sum().reset_index()

But it also does not work as I get another error

TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed

I'm currently a bit confused on how I can get this to work. I feel like I must be overlooking something relatively simple since I am able to get it to work for grouping by one column. I am just struggling to find a way to get it to work when grouping by multiple columns.

Upvotes: 1

Views: 1641

Answers (2)

valkyrie55
valkyrie55

Reputation: 505

Putting the columns you want to group by inside a list will solve the issue.

grouped_df = df.groupby(['col1', 'col2', 'col3'])['marks'].sum()

Upvotes: 0

Valdi_Bo
Valdi_Bo

Reputation: 30981

Check whether id column is within the subset "from 27-th column to the end".

I suppose that this column is at some earlier position in your df.

Another solution to try

Generate the target list of columns:

cols = df.columns[27:]

Then use it in your instruction:

df.groupby(['id', 'year'])[cols].sum().reset_index()

Note that cols itself is a list, but in the instruction above it must be "enveloped" with another pair of square brackets.

Upvotes: 2

Related Questions