carpediem
carpediem

Reputation: 437

Pandas groupby and aggregate over multiple lists

I have a data frame df, with two columns. I want to groupby one column and aggregate the lists that belong to same group by getting an "average list" (all lists elements averaged) as follows:

column_a, column_b
1,         [1,2,3]
1,         [2,5,1]
2,         [5,6,6]
3,         [2,0,1]
3,         [4,2,3]

The lists are always of the same fixed length. The desired output should be as follows:

group, avg_list
1,     [1.5,3.5,2]
2,     [5,6,6]
3,     [3,1,2]

I know I can use groupby on the dataframe and subsequently aggregate it but I'm not sure what to put in the agg part of the code.

df.groupby('column_a').agg(?)

I would appreciate any suggestions.

Upvotes: 3

Views: 1362

Answers (2)

DavideBrex
DavideBrex

Reputation: 2424

You can get the average of the lists within each group in this way:

s = df.groupby("column_a")["column_b"].apply(lambda x: np.array(x.tolist()).mean(axis=0))

pd.DataFrame({'group':s.index, 'avg_list':s.values})

Gives:

  group avg_list
0   1   [1.5, 3.5, 2.0]
1   2   [5.0, 6.0, 6.0]
2   3   [3.0, 1.0, 2.0]

Upvotes: 4

Karthik Radhakrishnan
Karthik Radhakrishnan

Reputation: 944

Group them together as a list and them use mean to find the average.

df.set_index('column_a')['column_b'].groupby('column_a').apply(list).apply(lambda x: np.mean(x,0))

Result

column_a
1    [1.5, 3.5, 2.0]
2    [5.0, 6.0, 6.0]
3    [3.0, 1.0, 2.0]

Upvotes: 0

Related Questions