00__00__00
00__00__00

Reputation: 5327

rolling unique value count in pandas across multiple columns

there are several answers around rolling count in pandas Rolling unique value count in pandas How to efficiently compute a rolling unique count in a pandas time series?

How do I count unique values across multiple columns? For one column, I can do:

df[my_col]=df[my_col].rolling(300).apply(lambda x: len(np.unique(x)))

How to extend to multipe columns, counting unique values overall across all values in the rolling window?

Upvotes: 1

Views: 141

Answers (2)

Laurent B.
Laurent B.

Reputation: 2263

I took a dataframe as a example (3-rows rolling window taking into account all the columns at the same time)

Dataframe for visualization

   col1  col2  col3
0     1     1     1
1     1     1     4
2     2     5     2
3     3     3     3
4     3     7     3
5     5     3     9
6     8     8     2

Proposed script for checkings

import pandas as pd

df = pd.DataFrame({'col1':[1, 1, 2, 3, 3, 5, 8],
                   'col2':[1, 1, 5, 3, 7, 3, 8],
                   'col3':[1, 4, 2, 3, 3, 9, 2],})

df['count'] = df.rolling(3).apply(lambda w: len(set(df.iloc[w.index].to_numpy().flatten())))['col1']

print(df)

Output

   col1  col2  col3  count
0     1     1     1    NaN
1     1     1     4    NaN
2     2     5     2    4.0
3     3     3     3    5.0
4     3     7     3    4.0
5     5     3     9    4.0
6     8     8     2    6.0

Another method

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1':[1, 1, 2, 3, 3, 5, 8],
                   'col2':[1, 1, 5, 3, 7, 3, 8],
                   'col3':[1, 4, 2, 3, 3, 9, 2],})

df = (df.assign( count=df.rolling(3, method='table')
        .apply(lambda d:len(set(d.flatten()) ), raw=True, engine="numba")
        .iloc[:,-1:] )
                 )

Upvotes: 1

Shubham Sharma
Shubham Sharma

Reputation: 71689

Inside a list comprehension iterate over the rolling windows and for each window flatten the values in required columns then use set to get the distinct elements

cols = [...] # define your cols here
df['count'] = [len(set(w[cols].values.ravel())) for w in df.rolling(300)]

Upvotes: 1

Related Questions