Reputation: 103
I have a pandas dataframe containing the below data, and i would like to add a new column which, for each date, returns the most frequently occurring 'weather_type' over the previous 3 days. Where the result is a tie, i'd like the most recent 'weather_type' to be returned.
d = {'date': ['17/02/2017', '18/02/2017', '19/02/2017', '20/02/2017',
'21/02/2017', '22/02/2017'], 'precipmm': [1, 0, 3, 2, 7, 8], 'weather_type':
['rain', 'sun', 'rain', 'snow', 'snow', 'rain']}
df = pd.DataFrame(data=d)
df['date']=pd.to_datetime(df['date'], format='%d/%m/%Y')
df['rollingsum_precipmm']=df['precipmm'].rolling(window=3).sum()
I've already managed to create a new column containing the sum of the total 'precipmm' over the last 3 days using the below:
df['rollingsum_precipmm']=df['precipmm'].rolling(window=3).sum()
I suspect the answer revolves around this, but as yet i've been unable to find a solution.
Any help much appreciated as always
Colin
Upvotes: 7
Views: 5915
Reputation: 11
For the result to be appeared in a new column:
df=df.assign(new_column=df['precipmm'].rolling(window=7).apply(lambda x: mode(x)[0]))
Upvotes: 1
Reputation: 9018
To get rolling mode, you can do:
from scipy.stats import mode
df['precipmm'].rolling(window=7).apply(lambda x: mode(x)[0])
Upvotes: 10