Ferran
Ferran

Reputation: 840

Pandas GroupBy list values in a column of lists and find their mean

I have a large pandas dataframe (1M rows) with the following format:

data = {
    'names': {0: ['Lily', 'Kerry', 'Mona'], 1: ['Kerry', 'Mona'], 2: ['Mona']},
    'sentiment': {0: 10, 1: 2, 2: 0}
}
df = pd.DataFrame(data)
df

                 names  sentiment
0  [Lily, Kerry, Mona]         10
1        [Kerry, Mona]          2
2               [Mona]          0

I would like to compute the average sentiment for each unique name in the names column, resulting in the following:

   names  sentiment
0   Lily         10
1  Kerry          6
2   Mona          4

The number of unique names is extremely long so efficiency is important

Upvotes: 2

Views: 51

Answers (1)

cs95
cs95

Reputation: 402942

This requires an explosion on the "name" column first, followed by a standard GroupBy.mean():

df.explode('names').groupby('names', as_index=False, sort=False).mean()

   names  sentiment
0   Lily         10
1  Kerry          6
2   Mona          4

Upvotes: 3

Related Questions