rolling uniques across groups + time in pandas

Question

Having trouble calculating rolling 7 day unique users, by group in a group-user-date dataset. It's a classic metric and figured someone could help me do this in pandas.

Example data:

from StringIO import StringIO
import pandas as pd

data = StringIO("""grp1,user,date
    a,1,2016-10-10
    a,1,2016-10-09
    a,1,2016-10-07
    a,2,2016-10-09
    a,2,2016-10-06
    a,3,2016-10-10
    a,3,2016-10-09
    """)

df = pd.read_csv(data)

For this simple dataset, I want to return:

    a, 2016-10-10, 3  <- 3 users were in group a in the 7 days ending 10/10
    a, 2016-10-09, 3  <- 3 users were in group a in the 7 days ending 10/09
    a, 2016-10-07, 2  <- 2 users were in group a in the 7 days ending 10/07
    a, 2016-10-06, 1  <- 1 users were in group a in the 7 days ending 10/06

I don't mind if it's a transform of the original dataset or an aggregation.

Have tried 1) a lot of searching and 2) a lot of variations of

from datetime import datetime, timedelta

rolling_uniques = lambda x: x['user'].unique().size if x['date'] + timedelta(days=6) <= x['date'].max() else 0

df.apply(rolling_uniques, axis=1)

OR

df.groupby(['grp1', 'user', 'date']).transform(rolling_uniques)

but nothing is working out. In my data I have multiple group columns and of course more categories within grp1 than just 'a'.

rolling uniques across groups + time in pandas

Answers (1)

Related Questions