Jeferson Correa
Jeferson Correa

Reputation: 67

Pandas groupy aggregate/apply specific functions to specific columns (np.sum, sum)

I have the following dataframe:

df = pd.DataFrame([{'name': 'John', 'counter' : [1,1,3,5]},{'name': 'John', 'counter' : [2,0,1,5]},{'name': 'John', 'counter' : [4,1,2,2]}])
df['counter'] = df['counter'].apply(lambda x : np.array(x))
df['counter2'] = df['counter']
df['pmcount'] = 1

df
   name       counter      counter2  pmcount
0  John  [1, 1, 3, 5]  [1, 1, 3, 5]        1
1  John  [2, 0, 1, 5]  [2, 0, 1, 5]        1
2  John  [4, 1, 2, 2]  [4, 1, 2, 2]        1

And I need to group the data by 'name' applying the "np.sum", "np.maximum.reduce" and "sum" functions to each column respectively.


Separately, each function works fine:

result1 = df.groupby(['name'])['counter'].apply(np.sum).reset_index()
result1
   name        counter
0  John  [7, 2, 6, 12]

result2 = df.groupby(['name'])['counter2'].apply(lambda x: np.maximum.reduce(list(x))).reset_index()
result2
   name      counter2
0  John  [4, 1, 3, 5]

result3 = df.groupby(['name'])['pmcount'].sum().reset_index()
result3
   name  pmcount
0  John        3

But when I try to use the pandas agreggate function to specify each function to each column I get an error:

function_dict = {'counter': np.sum , "counter2": lambda x: np.maximum.reduce(list(x)) , 'pmcount': 'sum'}
result = df.groupby('name').agg(function_dict)

ValueError: Must produce aggregated value

Expected Result:

   name        counter      counter2  pmcount
0  John  [7, 2, 6, 12]  [4, 1, 3, 5]        3

I tried using list type instead of np.array in the array columns, but not only I got the same error, but also I couldn't reproduce the same result as before in the np.sum function (even using the np.array() with the lambda expression).

Upvotes: 4

Views: 682

Answers (3)

Pygirl
Pygirl

Reputation: 13349

try:

df = df.groupby(['name']).agg({'counter': lambda x: list(x.sum()), 'counter2': lambda x: ((list(x))), 'pmcount': 'sum'}) .reset_index()
df['counter2'] = df['counter2'].apply(lambda x: np.maximum.reduce(np.array(x)))

OR

df.groupby(['name']).agg({'counter': lambda x: list(x.sum()), 'counter2': lambda x: list(np.maximum.reduce(list(x))), 'pmcount': 'sum'}).reset_index()

df:

    name    counter         counter2        pmcount
0   John    [7, 2, 6, 12]   [4, 1, 3, 5]    3

Upvotes: 1

jjsantoso
jjsantoso

Reputation: 1758

You have to convert the results in lists, otherwise, the result will be interpreted as a Series o DataFrame

function_dict = {'counter': lambda x: list(np.sum(x)) , "counter2": lambda x: list(np.maximum.reduce(list(x))) , 'pmcount': 'sum'}

Upvotes: 2

apaolillo
apaolillo

Reputation: 135

No need to aggregate, as you already did the work:

  1. Factor-out the "group by" operation
  2. DO NOT reset indexes (at the intermediate step)
  3. Create a new dataframe that will have the same index
ndf = pd.DataFrame()
ndf['counter'] = gdf['counter'].apply(np.sum)
ndf['counter2'] = gdf['counter2'].apply(lambda x: np.maximum.reduce(list(x)))
ndf['pmcount'] = gdf['pmcount'].sum()
ndf.reset_index(inplace=True)

Out[1]: 
   name        counter      counter2  pmcount
0  John  [7, 2, 6, 12]  [4, 1, 3, 5]        3

Upvotes: 3

Related Questions