Reputation: 67
I have the following dataframe:
df = pd.DataFrame([{'name': 'John', 'counter' : [1,1,3,5]},{'name': 'John', 'counter' : [2,0,1,5]},{'name': 'John', 'counter' : [4,1,2,2]}])
df['counter'] = df['counter'].apply(lambda x : np.array(x))
df['counter2'] = df['counter']
df['pmcount'] = 1
df
name counter counter2 pmcount
0 John [1, 1, 3, 5] [1, 1, 3, 5] 1
1 John [2, 0, 1, 5] [2, 0, 1, 5] 1
2 John [4, 1, 2, 2] [4, 1, 2, 2] 1
And I need to group the data by 'name' applying the "np.sum", "np.maximum.reduce" and "sum" functions to each column respectively.
Separately, each function works fine:
result1 = df.groupby(['name'])['counter'].apply(np.sum).reset_index()
result1
name counter
0 John [7, 2, 6, 12]
result2 = df.groupby(['name'])['counter2'].apply(lambda x: np.maximum.reduce(list(x))).reset_index()
result2
name counter2
0 John [4, 1, 3, 5]
result3 = df.groupby(['name'])['pmcount'].sum().reset_index()
result3
name pmcount
0 John 3
But when I try to use the pandas agreggate function to specify each function to each column I get an error:
function_dict = {'counter': np.sum , "counter2": lambda x: np.maximum.reduce(list(x)) , 'pmcount': 'sum'}
result = df.groupby('name').agg(function_dict)
ValueError: Must produce aggregated value
Expected Result:
name counter counter2 pmcount
0 John [7, 2, 6, 12] [4, 1, 3, 5] 3
I tried using list type instead of np.array in the array columns, but not only I got the same error, but also I couldn't reproduce the same result as before in the np.sum function (even using the np.array() with the lambda expression).
Upvotes: 4
Views: 682
Reputation: 13349
try:
df = df.groupby(['name']).agg({'counter': lambda x: list(x.sum()), 'counter2': lambda x: ((list(x))), 'pmcount': 'sum'}) .reset_index()
df['counter2'] = df['counter2'].apply(lambda x: np.maximum.reduce(np.array(x)))
OR
df.groupby(['name']).agg({'counter': lambda x: list(x.sum()), 'counter2': lambda x: list(np.maximum.reduce(list(x))), 'pmcount': 'sum'}).reset_index()
df:
name counter counter2 pmcount
0 John [7, 2, 6, 12] [4, 1, 3, 5] 3
Upvotes: 1
Reputation: 1758
You have to convert the results in lists, otherwise, the result will be interpreted as a Series o DataFrame
function_dict = {'counter': lambda x: list(np.sum(x)) , "counter2": lambda x: list(np.maximum.reduce(list(x))) , 'pmcount': 'sum'}
Upvotes: 2
Reputation: 135
No need to aggregate, as you already did the work:
ndf = pd.DataFrame()
ndf['counter'] = gdf['counter'].apply(np.sum)
ndf['counter2'] = gdf['counter2'].apply(lambda x: np.maximum.reduce(list(x)))
ndf['pmcount'] = gdf['pmcount'].sum()
ndf.reset_index(inplace=True)
Out[1]:
name counter counter2 pmcount
0 John [7, 2, 6, 12] [4, 1, 3, 5] 3
Upvotes: 3