Guitarman045
Guitarman045

Reputation: 37

How to sum column in reverse order in pandas with groupby

I currently need to replicate this dataset where I must groupby subjectID copy and count how many have a score of 1 in the future. I must count them in reverse basically however I'm not sure how to do that and groupby subject ID at the same time.

   SubjectID copy  Score  Number of All Future Hosp for O column
0        phchp003      1                                       4
1        phchp003      1                                       3
2        phchp003      1                                       2
3        phchp003      1                                       1
4        phchp003      1                                       0
5        phchp004      1                                       4
6        phchp004      1                                       3
7        phchp004      1                                       2
8        phchp004      1                                       1
9        phchp004      1                                       0
10       phchp006      0                                       3
11       phchp006      0                                       3
12       phchp006      0                                       3
13       phchp006      0                                       3
14       phchp006      1                                       2
15       phchp006      1                                       1
16       phchp006      1                                       0

I currently have

data['Sum']= data.groupby(['SubjectID copy'])['Score'].cumsum()

which gives me the values but summed in descending order, where I need mine to go from bottom up.

Upvotes: 0

Views: 159

Answers (2)

ALollz
ALollz

Reputation: 59549

To find how many 1s come in the future transform the group sum and subtract off the cumsum.

df['Sum'] = (df.groupby('SubjectID copy')['Score'].transform('sum')
             - df.groupby('SubjectID copy')['Score'].cumsum())

   SubjectID copy  Score  Sum
0        phchp003      1    4
1        phchp003      1    3
2        phchp003      1    2
3        phchp003      1    1
4        phchp003      1    0
5        phchp004      1    4
6        phchp004      1    3
7        phchp004      1    2
8        phchp004      1    1
9        phchp004      1    0
10       phchp006      0    3
11       phchp006      0    3
12       phchp006      0    3
13       phchp006      0    3
14       phchp006      1    2
15       phchp006      1    1
16       phchp006      1    0

Upvotes: 1

Henry Ecker
Henry Ecker

Reputation: 35646

We can use loc to reverse before using groupby transform. We can then use shift and cumsum to only consider "future" values:

data['Sum'] = (
    data.loc[::-1]  # Reverse DataFrame
        .groupby(['SubjectID copy'])['Score']  # Groups
        .transform(lambda s: s.shift(fill_value=0).cumsum())  # handle transformation
)

data:

   SubjectID copy  Score  Sum
0        phchp003      1    4
1        phchp003      1    3
2        phchp003      1    2
3        phchp003      1    1
4        phchp003      1    0
5        phchp004      1    4
6        phchp004      1    3
7        phchp004      1    2
8        phchp004      1    1
9        phchp004      1    0
10       phchp006      0    3
11       phchp006      0    3
12       phchp006      0    3
13       phchp006      0    3
14       phchp006      1    2
15       phchp006      1    1
16       phchp006      1    0

Upvotes: 1

Related Questions