Reputation: 531
I have a data frame that looks like below. Words
refers to the number of words per email sent.
sender receiver words
a b 10
a c 5
a c 15
b a 50
b a 30
I'm relatively new to Pandas. I'd like to calculate the harmonic mean of 1)the number of emails sent between each pair 2) total number of words sent between two people. How do I use hmean()
from scipy.stats
to obtain the desired output?
sender receiver total_emails total_words
a b hmean([10])
a c hmean([5,15])
b a hmean([50,30])
For the total number of emails, I am not sure what should be the correct formula. Any help would be appreciated!
Upvotes: 4
Views: 1358
Reputation: 14949
you can use groupby
:
from scipy import stats
df = df.groupby(['sender', 'receiver']).agg(stats.hmean).reset_index(name='total_words')
OUTPUT:
sender receiver total_words
0 a b 10.0
1 a c 7.5
2 b a 37.5
Upvotes: 5