Reputation: 327
I am attempting to group by multiple columns and return the sum of select columns in my dataframe. I was able to do this by only grouping by one column
df_sum = df.iloc[:, 27:].groupby(df['id']).sum().reset_index()
I am successfully grouping by id
and summing the values from column 27 to the end of my dataframe. However, when I try to group by multiple columns using
df_sum = df.iloc[:, 27:].groupby(df['id', 'year']).sum().reset_index()
I get an error
KeyError: ('id', 'year')
I have been able to group by multiple columns before by using something like
df.groupby(['id', 'year'])['some col name'].nunique().reset_index()
I tried using this type of format
df.groupby(['id', 'year'])[df.iloc[:,27:]].sum().reset_index()
But it also does not work as I get another error
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
I'm currently a bit confused on how I can get this to work. I feel like I must be overlooking something relatively simple since I am able to get it to work for grouping by one column. I am just struggling to find a way to get it to work when grouping by multiple columns.
Upvotes: 1
Views: 1641
Reputation: 505
Putting the columns you want to group by inside a list will solve the issue.
grouped_df = df.groupby(['col1', 'col2', 'col3'])['marks'].sum()
Upvotes: 0
Reputation: 30981
Check whether id column is within the subset "from 27-th column to the end".
I suppose that this column is at some earlier position in your df.
Generate the target list of columns:
cols = df.columns[27:]
Then use it in your instruction:
df.groupby(['id', 'year'])[cols].sum().reset_index()
Note that cols itself is a list, but in the instruction above it must be "enveloped" with another pair of square brackets.
Upvotes: 2