Reputation: 437
I have a data frame df, with two columns. I want to groupby one column and aggregate the lists that belong to same group by getting an "average list" (all lists elements averaged) as follows:
column_a, column_b
1, [1,2,3]
1, [2,5,1]
2, [5,6,6]
3, [2,0,1]
3, [4,2,3]
The lists are always of the same fixed length. The desired output should be as follows:
group, avg_list
1, [1.5,3.5,2]
2, [5,6,6]
3, [3,1,2]
I know I can use groupby on the dataframe and subsequently aggregate it but I'm not sure what to put in the agg part of the code.
df.groupby('column_a').agg(?)
I would appreciate any suggestions.
Upvotes: 3
Views: 1362
Reputation: 2424
You can get the average of the lists within each group in this way:
s = df.groupby("column_a")["column_b"].apply(lambda x: np.array(x.tolist()).mean(axis=0))
pd.DataFrame({'group':s.index, 'avg_list':s.values})
Gives:
group avg_list
0 1 [1.5, 3.5, 2.0]
1 2 [5.0, 6.0, 6.0]
2 3 [3.0, 1.0, 2.0]
Upvotes: 4
Reputation: 944
Group them together as a list and them use mean to find the average.
df.set_index('column_a')['column_b'].groupby('column_a').apply(list).apply(lambda x: np.mean(x,0))
Result
column_a
1 [1.5, 3.5, 2.0]
2 [5.0, 6.0, 6.0]
3 [3.0, 1.0, 2.0]
Upvotes: 0