bison72
bison72

Reputation: 324

Group by based on a specific column and apply the function in Python

I am learning to apply self defined function to each of the group in data frame. Let say I have data frame as below:

A       B       C
1       4       3
1       5       4
1       2       10
2       7       2
2       4       4
2       6       6

I defined a simple function to get the length of column 'B' and the total of column 'C' followed by summation of the length and total for each group in column 'A' to generate column 'D'. Therefore I expect to have the following output:

A       D
1       20
2       15

I ran the code below and am not able to get what I want:

>>> import pandas as pd
>>> 
>>> df = pd.read_csv("foo.txt", sep="\t")
>>> df
   A  B   C
0  1  4   3
1  1  5   4
2  1  2  10
3  2  7   2
4  2  4   4
5  2  6   6
>>> 
>>> def someFunction(x, y):
...         length = len(x)
...         total = sum(y)
...         number = length + total
...         print(number)
... 
>>> f = lambda x: someFunction(x['B'], x['C'])
>>> output = df.groupby(['A']).apply(f)
20
20
15
>>> output
Empty DataFrame
Columns: []
Index: []
>>> 

How do I get the desired output? Thanks in advance.

Upvotes: 1

Views: 43

Answers (2)

smerllo
smerllo

Reputation: 3375

This should the job :

import pandas as pd 

df= pd.DataFrame()

df['A']= [1,1,1,2,2,2]
df['B']= [4,3,2,7,4,6]
df['C']= [3,4,10,2,4,6]

def someFunction(data):

    return len(data['B'])+ sum(data['C'])

# apply to groupby 
df.groupby('A').apply(someFunction)

Output[1]:

A
1    20
2    15
dtype: int64

Remember to pass a DataFrame to the function, not x and y for more convenience in your code.

Upvotes: 2

yatu
yatu

Reputation: 88236

You can use DataFrame.agg for multiple aggregation functions, and then sum on axis=1:

df.groupby('A').agg({'B':'size', 'C':'sum'}).sum(1).reset_index(name='D')

   A   D
0  1  20
1  2  15

Upvotes: 1

Related Questions