Concatenate pandas Dataframe via groupby

Question

I have a pandas DataFrame with columns 'x', 'y', 'z' However a lot of the x and y values are redundant. I want to take all rows that have the same x and y values and sum the third column, returning a smaller DataFrame. So given

         x     y         z
0       1      2         1
1       1      2         5
2       1      2         0
3       1      3         0
4       2      6         1

it would return:

        x      y         z
0       1      2         6
1       1      3         0
2       2      6         1

I've tried

df = df.groupby(['x', 'y'])['z'].sum

but I'm not sure how to work with grouped objects.

Brad Solomon · Accepted Answer

Very close as-is; you just need to call .sum() and then reset the index:

>>> df.groupby(['x', 'y'])['z'].sum().reset_index()
   x  y  z
0  1  2  6
1  1  3  0
2  2  6  1

There is also a parameter to groupby() that handles that:

>>> df.groupby(['x', 'y'], as_index=False)['z'].sum()
   x  y  z
0  1  2  6
1  1  3  0
2  2  6  1

In your question, you have df.groupby(['x', 'y'])['z'].sum without parentheses. This simply references the method .sum as a Python object, without calling it.

>>> type(df.groupby(['x', 'y'])['z'].sum)
method

>>> callable(df.groupby(['x', 'y'])['z'].sum)
True

Concatenate pandas Dataframe via groupby

Answers (2)

Related Questions