Reputation: 37
I currently need to replicate this dataset where I must groupby subjectID copy and count how many have a score of 1 in the future. I must count them in reverse basically however I'm not sure how to do that and groupby subject ID at the same time.
SubjectID copy Score Number of All Future Hosp for O column
0 phchp003 1 4
1 phchp003 1 3
2 phchp003 1 2
3 phchp003 1 1
4 phchp003 1 0
5 phchp004 1 4
6 phchp004 1 3
7 phchp004 1 2
8 phchp004 1 1
9 phchp004 1 0
10 phchp006 0 3
11 phchp006 0 3
12 phchp006 0 3
13 phchp006 0 3
14 phchp006 1 2
15 phchp006 1 1
16 phchp006 1 0
I currently have
data['Sum']= data.groupby(['SubjectID copy'])['Score'].cumsum()
which gives me the values but summed in descending order, where I need mine to go from bottom up.
Upvotes: 0
Views: 159
Reputation: 59549
To find how many 1
s come in the future transform
the group sum
and subtract off the cumsum
.
df['Sum'] = (df.groupby('SubjectID copy')['Score'].transform('sum')
- df.groupby('SubjectID copy')['Score'].cumsum())
SubjectID copy Score Sum
0 phchp003 1 4
1 phchp003 1 3
2 phchp003 1 2
3 phchp003 1 1
4 phchp003 1 0
5 phchp004 1 4
6 phchp004 1 3
7 phchp004 1 2
8 phchp004 1 1
9 phchp004 1 0
10 phchp006 0 3
11 phchp006 0 3
12 phchp006 0 3
13 phchp006 0 3
14 phchp006 1 2
15 phchp006 1 1
16 phchp006 1 0
Upvotes: 1
Reputation: 35646
We can use loc
to reverse before using groupby transform
. We can then use shift
and cumsum
to only consider "future" values:
data['Sum'] = (
data.loc[::-1] # Reverse DataFrame
.groupby(['SubjectID copy'])['Score'] # Groups
.transform(lambda s: s.shift(fill_value=0).cumsum()) # handle transformation
)
data
:
SubjectID copy Score Sum
0 phchp003 1 4
1 phchp003 1 3
2 phchp003 1 2
3 phchp003 1 1
4 phchp003 1 0
5 phchp004 1 4
6 phchp004 1 3
7 phchp004 1 2
8 phchp004 1 1
9 phchp004 1 0
10 phchp006 0 3
11 phchp006 0 3
12 phchp006 0 3
13 phchp006 0 3
14 phchp006 1 2
15 phchp006 1 1
16 phchp006 1 0
Upvotes: 1