Reputation: 324
I am learning to apply self defined function to each of the group in data frame. Let say I have data frame as below:
A B C
1 4 3
1 5 4
1 2 10
2 7 2
2 4 4
2 6 6
I defined a simple function to get the length of column 'B' and the total of column 'C' followed by summation of the length and total for each group in column 'A' to generate column 'D'. Therefore I expect to have the following output:
A D
1 20
2 15
I ran the code below and am not able to get what I want:
>>> import pandas as pd
>>>
>>> df = pd.read_csv("foo.txt", sep="\t")
>>> df
A B C
0 1 4 3
1 1 5 4
2 1 2 10
3 2 7 2
4 2 4 4
5 2 6 6
>>>
>>> def someFunction(x, y):
... length = len(x)
... total = sum(y)
... number = length + total
... print(number)
...
>>> f = lambda x: someFunction(x['B'], x['C'])
>>> output = df.groupby(['A']).apply(f)
20
20
15
>>> output
Empty DataFrame
Columns: []
Index: []
>>>
How do I get the desired output? Thanks in advance.
Upvotes: 1
Views: 43
Reputation: 3375
This should the job :
import pandas as pd
df= pd.DataFrame()
df['A']= [1,1,1,2,2,2]
df['B']= [4,3,2,7,4,6]
df['C']= [3,4,10,2,4,6]
def someFunction(data):
return len(data['B'])+ sum(data['C'])
# apply to groupby
df.groupby('A').apply(someFunction)
Output[1]:
A
1 20
2 15
dtype: int64
Remember to pass a DataFrame
to the function, not x
and y
for more convenience in your code.
Upvotes: 2
Reputation: 88236
You can use DataFrame.agg
for multiple aggregation functions, and then sum
on axis=1
:
df.groupby('A').agg({'B':'size', 'C':'sum'}).sum(1).reset_index(name='D')
A D
0 1 20
1 2 15
Upvotes: 1