Reputation: 365
Good morning.
I've a pandas dataframe like the following:
df =
p f c a
0 1 2 1 16.32
1 1 2 2 48
2 1 2 3 60
3 1 2 4 112
4 1 2 5 52
5 1 3 6 288
6 1 4 7 201
7 1 4 8 52
8 1 4 4 44
9 1 5 7 251.2
10 1 5 9 220
11 1 5 8 83
12 1 5 10 142
13 2 1 11 100
14 2 1 12 110
15 2 2 11 120
16 2 2 13 130
17 2 3 13 140
18 2 3 14 150
19 2 4 12 160
And I want to do a groupby along columns c and a, but grouping c using something like SQL's COUNT(DISTINCT) and grouping a using sum(), in a way my result will be:
df_result =
p f c a
0 1 2 5 288.32
1 1 3 6 576.32
2 1 4 8 873.92
3 1 5 10 1570.12
4 2 1 2 210
5 2 2 3 460
6 2 3 4 750
7 2 4 4 910
But I can't reach that result trying differents combinations of groupby and stack.
EDIT Take into account that column 'c' stores ID numbers, so the ascending order is just an example, so max aggregate wouldn't work. Sorry for don't say it before.
I think that a possible solution would be split it in two differents dataframes, grouping and then merge, but I'm not sure if this is the best solution.
Thanks you very much in advance.
Upvotes: 3
Views: 86
Reputation: 863246
You need aggregate list
and sum
first, then call DataFrame.cumsum
:
df = df.groupby('f').agg({'c':list, 'a':'sum'}).cumsum()
print (df)
c a
f
2 [154, 215, 1, 8000, 214] 288.32
3 [154, 215, 1, 8000, 214, 640] 576.32
4 [154, 215, 1, 8000, 214, 640, 780, 830, 8000] 873.32
5 [154, 215, 1, 8000, 214, 640, 780, 830, 8000, ... 1569.52
And last get length of unique values per lists:
df['c'] = df['c'].apply(lambda x: len(set(x)))
df = df.reset_index()
print (df)
f c a
0 2 5 288.32
1 3 6 576.32
2 4 8 873.32
3 5 10 1569.52
EDIT:
df = (df.groupby(['p','f']).agg({'c':list, 'a':'sum'})
.groupby('p').apply(np.cumsum))
df['c'] = df['c'].apply(lambda x: len(set(x)))
df = df.reset_index()
print (df)
p f c a
0 1 2 5 288.32
1 1 3 6 576.32
2 1 4 8 873.32
3 1 5 10 1569.52
4 2 1 2 210
5 2 2 3 460
6 2 3 4 750
7 2 4 4 910
Upvotes: 1