Reputation: 840
I have a large pandas dataframe (1M rows) with the following format:
data = {
'names': {0: ['Lily', 'Kerry', 'Mona'], 1: ['Kerry', 'Mona'], 2: ['Mona']},
'sentiment': {0: 10, 1: 2, 2: 0}
}
df = pd.DataFrame(data)
df
names sentiment
0 [Lily, Kerry, Mona] 10
1 [Kerry, Mona] 2
2 [Mona] 0
I would like to compute the average sentiment for each unique name in the names column, resulting in the following:
names sentiment
0 Lily 10
1 Kerry 6
2 Mona 4
The number of unique names is extremely long so efficiency is important
Upvotes: 2
Views: 51
Reputation: 402942
This requires an explosion on the "name" column first, followed by a standard GroupBy.mean()
:
df.explode('names').groupby('names', as_index=False, sort=False).mean()
names sentiment
0 Lily 10
1 Kerry 6
2 Mona 4
Upvotes: 3