Reputation: 71
I would like to have a new column with the most frequent rolling value from another str/object
column.
date name state
0 2024-02-29 Alice CA
1 2024-02-27 Bob HI
2 2024-02-29 Cindy ID
3 2024-02-25 Dan MT
4 2024-02-29 Elliott CA
.. ... ... ...
I am trying to get the most frequent rolling state
(for each row).
I have tried different combinations (and subsets) of
.rolling()
.apply()
.mode()
which usually generates one of a handful of errors complaining that the column is non-numeric. I understand what the error is telling me - that it expects to aggregate and apply a numeric function (.mean()
.sum()
...) - but its not even getting to the .apply()
function...
def fail_now(x):
raise Exception('wow! we made it here!')
>>> df['state'].rolling(window=25).apply(fail_now)
...
pandas.errors.DataError: No numeric types to aggregate
>>> df[['state']].rolling(window=25).apply(fail_now)
...
pandas.errors.DataError: Cannot aggregate non-numeric type: object
>>> df[['state']].rolling(window=25)['state'].apply(fail_now)
...
pandas.errors.DataError: No numeric types to aggregate
I also tried a multitude of different things, including the raw
flag in .apply()
with no luck
Upvotes: 2
Views: 67
Reputation: 262214
rolling
only works with numeric values, you could first factorize
your data (to have numbers), then map
the original names:
a, idx = pd.factorize(df['state'])
df['rolling_mode'] = (pd.Series(a, index=df.index)
.rolling(3, min_periods=1) # window=3 for the demo
.apply(lambda x: x.mode()[0])
.map(dict(enumerate(idx)))
)
Output:
date name state rolling_mode
0 2024-02-29 Alice CA CA
1 2024-02-27 Bob HI CA
2 2024-02-29 Cindy ID CA
3 2024-02-25 Dan MT HI
4 2024-02-29 Elliott CA CA
Upvotes: 0