How to calculate weighted mean and median in python?

Question

I have data in pandas DataFrame or NumPy array and want to calculate the weighted mean(average) or weighted median based on some weights in another column or array. I am looking for a simple solution rather than writing functions from scratch or copy-paste them everywhere I need them.

The data looks like this -

state.head()
    State    Population Murder.Rate Abbreviation
0   Alabama     4779736     5.7     AL
1   Alaska      710231      5.6     AK
2   Arizona     6392017     4.7     AZ
3   Arkansas    2915918     5.6     AR
4   California  37253956    4.4     CA

And I want to calculate the weighted mean or median of murder rate which takes into account the different populations in the states.

How can I do that?

bhola prasad · Accepted Answer

First, install the weightedstats library in python.

pip install weightedstats

Then, do the following -

Weighted Mean

ws.weighted_mean(state['Murder.Rate'], weights=state['Population'])
4.445833981123394

Weighted Median

ws.weighted_median(state['Murder.Rate'], weights=state['Population'])
4.4

It also has special weighted mean and median methods to use with numpy arrays. The above methods will work but in case if you need it.

my_data = [1, 2, 3, 4, 5]
my_weights = [10, 1, 1, 1, 9]

ws.numpy_weighted_mean(my_data, weights=my_weights)
ws.numpy_weighted_median(my_data, weights=my_weights)

How to calculate weighted mean and median in python?

Answers (1)

Related Questions