Reputation: 8033
I have a dataframe as below
id val type
aa 0 C
aa 1 T
aa 2 T
aa 3 T
aa 0 M
aa 1 M
aa 2 C
aa 3 M
bbb 0 C
bbb 1 T
bbb 2 T
bbb 3 T
bbb 0 M
bbb 1 M
bbb 2 C
bbb 3 T
cccccc 0 C
cccccc 1 T
cccccc 2 T
cccccc 3 T
cccccc 0 M
cccccc 1 M
cccccc 0 C
cccccc 1 C
I want to do a groupby
"ID" and then sum
& count
the rows in column "val", however the rows that should be summed are only the rows that contain the "type" same as the last value of column "type" in each group.
For example the last row of group has "type" M, so only the rows with "type" M in the group be summed & counted. So, values 0,1 & 3 needs to be summed & the count is 3.
The expected output for the df above is as below. Column "type" in the output is not mandatory, it can be left out if it takes more time to get that. I am showing it here just o give clarity on what i want to achieve.
id val count type
aa 4 3 M
bbb 9 4 T
cccccc 1 3 C
Upvotes: 1
Views: 53
Reputation: 1234
# find the last type by group
last_type = df[["id", "type"]].groupby("id").tail(1)
df['count'] = 1
# inner merge with the last type (i.e. keep only the last type by group)
df.merge(last_type, on=["id", "type"], how="inner").groupby(["id", "type"]).agg({'val':'sum', 'count': 'sum'}).reset_index()
Output
id type val count
0 aa M 4 3
1 bbb T 9 4
2 cccccc C 1 3
Upvotes: 1
Reputation: 862791
Use GroupBy.transform
with last
for filtering and then aggregate by named aggregations, working for pandas 0.25+:
df = (df[df['type'].eq(df.groupby('id')['type'].transform('last'))]
.groupby('id').agg(val=('val', 'sum'),
count=('val', 'size'),
type=('type','last')))
print (df)
val count type
id
aa 4 3 M
bbb 9 4 T
cccccc 1 3 C
Another solution with Series.map
by Series created by DataFrame.drop_duplicates
and DataFrame.set_index
:
s = df.drop_duplicates('id', keep='last').set_index('id')['type']
df = (df[df['type'].eq(df['id'].map(s))]
.groupby('id').agg(val=('val', 'sum'),
count=('val', 'size'),
type=('type','last')))
print (df)
val count type
id
aa 4 3 M
bbb 9 4 T
cccccc 1 3 C
Upvotes: 2