Reputation: 109
I have a pandas DataFrame with columns 'x', 'y', 'z' However a lot of the x and y values are redundant. I want to take all rows that have the same x and y values and sum the third column, returning a smaller DataFrame. So given
x y z
0 1 2 1
1 1 2 5
2 1 2 0
3 1 3 0
4 2 6 1
it would return:
x y z
0 1 2 6
1 1 3 0
2 2 6 1
I've tried
df = df.groupby(['x', 'y'])['z'].sum
but I'm not sure how to work with grouped objects.
Upvotes: 1
Views: 68
Reputation: 40878
Very close as-is; you just need to call .sum()
and then reset the index:
>>> df.groupby(['x', 'y'])['z'].sum().reset_index()
x y z
0 1 2 6
1 1 3 0
2 2 6 1
There is also a parameter to groupby()
that handles that:
>>> df.groupby(['x', 'y'], as_index=False)['z'].sum()
x y z
0 1 2 6
1 1 3 0
2 2 6 1
In your question, you have df.groupby(['x', 'y'])['z'].sum
without parentheses. This simply references the method .sum
as a Python object, without calling it.
>>> type(df.groupby(['x', 'y'])['z'].sum)
method
>>> callable(df.groupby(['x', 'y'])['z'].sum)
True
Upvotes: 3
Reputation: 153460
Another option without using groupby
syntax is to use the indexes and summing on index levels like this:
df.set_index(['x','y']).sum(level=[0,1]).reset_index()
Output:
x y z
0 1 2 6
1 1 3 0
2 2 6 1
Upvotes: 0