Pandas groupby and aggregate over multiple lists

Question

I have a data frame df, with two columns. I want to groupby one column and aggregate the lists that belong to same group by getting an "average list" (all lists elements averaged) as follows:

column_a, column_b
1,         [1,2,3]
1,         [2,5,1]
2,         [5,6,6]
3,         [2,0,1]
3,         [4,2,3]

The lists are always of the same fixed length. The desired output should be as follows:

group, avg_list
1,     [1.5,3.5,2]
2,     [5,6,6]
3,     [3,1,2]

I know I can use groupby on the dataframe and subsequently aggregate it but I'm not sure what to put in the agg part of the code.

df.groupby('column_a').agg(?)

I would appreciate any suggestions.

DavideBrex · Accepted Answer

You can get the average of the lists within each group in this way:

s = df.groupby("column_a")["column_b"].apply(lambda x: np.array(x.tolist()).mean(axis=0))

pd.DataFrame({'group':s.index, 'avg_list':s.values})

Gives:

  group avg_list
0   1   [1.5, 3.5, 2.0]
1   2   [5.0, 6.0, 6.0]
2   3   [3.0, 1.0, 2.0]

Pandas groupby and aggregate over multiple lists

Answers (2)

Related Questions