mlx
mlx

Reputation: 514

Dynamic aggregating in pandas

My original dataframe looks like this :

A           B   C
27/03/2018  u1  Yes 
28/03/2018  u2  Yes
02/04/2018  u2  No
04/04/2018  u1  Yes
05/04/2018  u1  No
10/04/2018  u2  Yes 
11/04/2018  u1  No
16/04/2018  u2  Yes
13/05/2018  u2  No

I'd like to get a new column that does this : in each row, it will say how many "Yes" were found for the item in B during the 7 prior days to the actual date. As such, in this example, it would look like this :

A           B   C    D
27/03/2018  u1  Yes  0
28/03/2018  u2  Yes  0
02/04/2018  u2  No   1
04/04/2018  u1  Yes  1
05/04/2018  u1  No   1
10/04/2018  u2  Yes  0
11/04/2018  u1  No   1
16/04/2018  u2  Yes  1
13/05/2018  u2  No   0

I would like to get another column that would do the same but with the prior 30 days. I guess that figuring out how to do it for 7 days would make the rest easier.

P.S : I would also like to do the same but with the actual weeks instead of previous days (as such, for friday 01/06/2018, we would consider the week from 21 to 27 May).

How can this be achieved ?

Upvotes: 0

Views: 179

Answers (1)

jezrael
jezrael

Reputation: 863791

IIUC need:

def f(x):
    d = x['A'] - pd.Timedelta(7, unit='d')
    a = (df.A.between(d, x.A) & (df.B == x.B) & (df.C == 'Yes'))
    #print (df[a])
    return a.sum()

df['new'] = df.apply(f, axis=1)
print (df)
           A   B    C  new
0 2018-03-27  u1  Yes    1
1 2018-03-28  u2  Yes    1
2 2018-02-04  u2   No    0
3 2018-04-04  u1  Yes    1
4 2018-05-04  u1   No    0
5 2018-10-04  u2  Yes    1
6 2018-11-04  u1   No    0
7 2018-04-16  u2  Yes    1
8 2018-05-13  u2   No    0

EDIT:

week = df.A.dt.to_period('W')
df['start'] = week.apply(lambda r: r.start_time) - pd.Timedelta(7, unit='d')
df['end'] = week.apply(lambda r: r.end_time) + pd.Timedelta(1, unit='ns') - pd.Timedelta(8, unit='d')

def f(x):
    return (df.A.between(x.start, x.end) & (df.B == x.B) & (df.C == 'Yes')).sum()

df['new'] = df.apply(f, axis=1)
print (df)
           A   B    C      start        end  new
0 2018-03-27  u1  Yes 2018-03-19 2018-03-25    0
1 2018-03-28  u2  Yes 2018-03-19 2018-03-25    0
2 2018-02-04  u2   No 2018-01-22 2018-01-28    0
3 2018-04-04  u1  Yes 2018-03-26 2018-04-01    1
4 2018-05-04  u1   No 2018-04-23 2018-04-29    0
5 2018-10-04  u2  Yes 2018-09-24 2018-09-30    0
6 2018-11-04  u1   No 2018-10-22 2018-10-28    0
7 2018-04-16  u2  Yes 2018-04-09 2018-04-15    0
8 2018-05-13  u2   No 2018-04-30 2018-05-06    0

Upvotes: 1

Related Questions