Reputation: 609
I'm looking for the pandas equivalent of the SQL window aggregate function OVER()
clients = np.array(['c1', 'c2'])
df = pd.DataFrame({'client' : clients[np.random.randint(0,2,10)],\
'day_w' : np.random.randint(1,8,25) })
df.groupby(['client','day_w']).size()
I would like to add the following information to the DataFrame, for each row : Number of disctinct days per client.
Thanks in advance for your help
Upvotes: 1
Views: 423
Reputation: 394169
I don't think you need to groupby the week day also, just call transform
and pass func nunique
which will return the number of distinct days:
OK I think I know what you're after now:
In [98]:
df['days per client'] = df.groupby(['client'])['day_w'].transform(pd.Series.nunique)
df
Out[98]:
client day_w days per client
0 c2 3 6
1 c1 2 7
2 c1 5 7
3 c1 5 7
4 c2 4 6
5 c2 3 6
6 c2 7 6
7 c1 3 7
8 c2 2 6
9 c1 2 7
10 c1 6 7
11 c1 6 7
12 c1 4 7
13 c2 5 6
14 c1 6 7
15 c2 5 6
16 c2 7 6
17 c2 7 6
18 c1 2 7
19 c1 1 7
20 c2 2 6
21 c2 6 6
22 c1 7 7
23 c1 3 7
24 c1 4 7
Upvotes: 2