shouriha
shouriha

Reputation: 109

Concatenate pandas Dataframe via groupby

I have a pandas DataFrame with columns 'x', 'y', 'z' However a lot of the x and y values are redundant. I want to take all rows that have the same x and y values and sum the third column, returning a smaller DataFrame. So given

         x     y         z
0       1      2         1
1       1      2         5
2       1      2         0
3       1      3         0
4       2      6         1

it would return:

        x      y         z
0       1      2         6
1       1      3         0
2       2      6         1

I've tried

df = df.groupby(['x', 'y'])['z'].sum

but I'm not sure how to work with grouped objects.

Upvotes: 1

Views: 68

Answers (2)

Brad Solomon
Brad Solomon

Reputation: 40878

Very close as-is; you just need to call .sum() and then reset the index:

>>> df.groupby(['x', 'y'])['z'].sum().reset_index()
   x  y  z
0  1  2  6
1  1  3  0
2  2  6  1

There is also a parameter to groupby() that handles that:

>>> df.groupby(['x', 'y'], as_index=False)['z'].sum()
   x  y  z
0  1  2  6
1  1  3  0
2  2  6  1

In your question, you have df.groupby(['x', 'y'])['z'].sum without parentheses. This simply references the method .sum as a Python object, without calling it.

>>> type(df.groupby(['x', 'y'])['z'].sum)
method

>>> callable(df.groupby(['x', 'y'])['z'].sum)
True

Upvotes: 3

Scott Boston
Scott Boston

Reputation: 153460

Another option without using groupby syntax is to use the indexes and summing on index levels like this:

df.set_index(['x','y']).sum(level=[0,1]).reset_index()

Output:

   x  y  z
0  1  2  6
1  1  3  0
2  2  6  1

Upvotes: 0

Related Questions