Sum & count of a column based on the content of the last value in each group after group-by

Question

I have a dataframe as below

id       val    type
aa         0    C
aa         1    T
aa         2    T
aa         3    T
aa         0    M
aa         1    M
aa         2    C
aa         3    M
bbb        0    C
bbb        1    T
bbb        2    T
bbb        3    T
bbb        0    M
bbb        1    M
bbb        2    C
bbb        3    T
cccccc     0    C
cccccc     1    T
cccccc     2    T
cccccc     3    T
cccccc     0    M
cccccc     1    M
cccccc     0    C
cccccc     1    C

I want to do a groupby "ID" and then sum & count the rows in column "val", however the rows that should be summed are only the rows that contain the "type" same as the last value of column "type" in each group.

For example the last row of group has "type" M, so only the rows with "type" M in the group be summed & counted. So, values 0,1 & 3 needs to be summed & the count is 3.

The expected output for the df above is as below. Column "type" in the output is not mandatory, it can be left out if it takes more time to get that. I am showing it here just o give clarity on what i want to achieve.

id     val  count   type
aa       4  3       M
bbb      9  4       T
cccccc   1  3       C

jezrael · Accepted Answer

Use GroupBy.transform with last for filtering and then aggregate by named aggregations, working for pandas 0.25+:

df = (df[df['type'].eq(df.groupby('id')['type'].transform('last'))]
            .groupby('id').agg(val=('val', 'sum'), 
                               count=('val', 'size'), 
                               type=('type','last')))
print (df)
        val  count type
id                     
aa        4      3    M
bbb       9      4    T
cccccc    1      3    C

Another solution with Series.map by Series created by DataFrame.drop_duplicates and DataFrame.set_index:

s = df.drop_duplicates('id', keep='last').set_index('id')['type']
df = (df[df['type'].eq(df['id'].map(s))]
            .groupby('id').agg(val=('val', 'sum'), 
                               count=('val', 'size'), 
                               type=('type','last')))
print (df)
        val  count type
id                     
aa        4      3    M
bbb       9      4    T
cccccc    1      3    C

Sum & count of a column based on the content of the last value in each group after group-by

Answers (2)

Related Questions

Sum &amp; count of a column based on the content of the last value in each group after group-by

Answers (2)

Related Questions

Sum & count of a column based on the content of the last value in each group after group-by