Reputation: 196
I have a Pandas DataFrame extracted from Estespark Weather for the dates between Sep-2009 and Oct-2018, and the mean of the Average windspeed column is 4.65. I am taking a challenge where there is a sanity check where the mean of this column needed to be 4.64. How can I modify the values of this column so that the mean of this column becomes 4.64? Is there any code solution for this, or do we have to do it manually?
Upvotes: 0
Views: 33
Reputation: 368
I can see two solutions:
df['AvgWS'] -= 0.01
current_mean = 4.65
desired_mean = 4.64
n_rows = len(df['AvgWS'])
df['can_remove'] = df['AvgWS'].map(lambda x: (current_mean*n_rows - x)/(n_rows-1) == 4.64)
This will create a new boolean column in your dataframe with True
in the rows that, if removed, make the rest of the column's mean = 4.64. If there are more than one you can analyse them to choose which one seems less important and then remove that one.
Upvotes: 1