Reputation: 725
I have data in pandas DataFrame or NumPy array and want to calculate the weighted mean(average) or weighted median based on some weights in another column or array. I am looking for a simple solution rather than writing functions from scratch or copy-paste them everywhere I need them.
The data looks like this -
state.head()
State Population Murder.Rate Abbreviation
0 Alabama 4779736 5.7 AL
1 Alaska 710231 5.6 AK
2 Arizona 6392017 4.7 AZ
3 Arkansas 2915918 5.6 AR
4 California 37253956 4.4 CA
And I want to calculate the weighted mean or median
of murder rate
which takes into account the different populations
in the states.
How can I do that?
Upvotes: 1
Views: 1754
Reputation: 725
First, install the weightedstats library in python.
pip install weightedstats
Then, do the following -
Weighted Mean
ws.weighted_mean(state['Murder.Rate'], weights=state['Population'])
4.445833981123394
Weighted Median
ws.weighted_median(state['Murder.Rate'], weights=state['Population'])
4.4
It also has special weighted mean and median methods to use with numpy arrays. The above methods will work but in case if you need it.
my_data = [1, 2, 3, 4, 5]
my_weights = [10, 1, 1, 1, 9]
ws.numpy_weighted_mean(my_data, weights=my_weights)
ws.numpy_weighted_median(my_data, weights=my_weights)
Upvotes: 3