ScalaBoy
ScalaBoy

Reputation: 3392

How to find most frequent values per batch using "rolling(window)"?

I want to apply a rolling window function to y_train DataFrame:

y_train is a single column:

0
0
1
..
2
0
3
0

Unique values in y_train:

np.unique(y_train.values)

> array([0, 1, 2, 3])

When I apply this code, I get float values in y_train:

window = 20
y_train = y_train.rolling(window).median().dropna()

New unique values in y_train:

np.unique(y_train.values)

> array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. ])

How can I apply rolling window function in order to get the most FREQUENT value per each window batch instead of median?

Upvotes: 2

Views: 370

Answers (1)

Divakar
Divakar

Reputation: 221624

We could use scipy.stats.mode alongwith apply() -

In [57]: a
Out[57]: 
0    2
1    3
2    2
3    2
4    7
5    3
6    2
7    4
8    6
9    3
dtype: int64

In [58]: from scipy import stats

In [59]: modeval = lambda x : mode(x)[0]

In [60]: a.rolling(window=5).apply(modeval).dropna()
Out[60]: 
4    2.0
5    2.0
6    2.0
7    2.0
8    2.0
9    3.0
dtype: float64

Upvotes: 1

Related Questions