moys
moys

Reputation: 8033

Sum & count of a column based on the content of the last value in each group after group-by

I have a dataframe as below

id       val    type
aa         0    C
aa         1    T
aa         2    T
aa         3    T
aa         0    M
aa         1    M
aa         2    C
aa         3    M
bbb        0    C
bbb        1    T
bbb        2    T
bbb        3    T
bbb        0    M
bbb        1    M
bbb        2    C
bbb        3    T
cccccc     0    C
cccccc     1    T
cccccc     2    T
cccccc     3    T
cccccc     0    M
cccccc     1    M
cccccc     0    C
cccccc     1    C

I want to do a groupby "ID" and then sum & count the rows in column "val", however the rows that should be summed are only the rows that contain the "type" same as the last value of column "type" in each group.

For example the last row of group has "type" M, so only the rows with "type" M in the group be summed & counted. So, values 0,1 & 3 needs to be summed & the count is 3.

The expected output for the df above is as below. Column "type" in the output is not mandatory, it can be left out if it takes more time to get that. I am showing it here just o give clarity on what i want to achieve.

id     val  count   type
aa       4  3       M
bbb      9  4       T
cccccc   1  3       C

Upvotes: 1

Views: 53

Answers (2)

9mat
9mat

Reputation: 1234

# find the last type by group
last_type = df[["id", "type"]].groupby("id").tail(1)


df['count'] = 1

# inner merge with the last type (i.e. keep only the last type by group)
df.merge(last_type, on=["id", "type"], how="inner").groupby(["id", "type"]).agg({'val':'sum', 'count': 'sum'}).reset_index()

Output

       id type  val  count
0      aa    M    4      3
1     bbb    T    9      4
2  cccccc    C    1      3

Upvotes: 1

jezrael
jezrael

Reputation: 862791

Use GroupBy.transform with last for filtering and then aggregate by named aggregations, working for pandas 0.25+:

df = (df[df['type'].eq(df.groupby('id')['type'].transform('last'))]
            .groupby('id').agg(val=('val', 'sum'), 
                               count=('val', 'size'), 
                               type=('type','last')))
print (df)
        val  count type
id                     
aa        4      3    M
bbb       9      4    T
cccccc    1      3    C

Another solution with Series.map by Series created by DataFrame.drop_duplicates and DataFrame.set_index:

s = df.drop_duplicates('id', keep='last').set_index('id')['type']
df = (df[df['type'].eq(df['id'].map(s))]
            .groupby('id').agg(val=('val', 'sum'), 
                               count=('val', 'size'), 
                               type=('type','last')))
print (df)
        val  count type
id                     
aa        4      3    M
bbb       9      4    T
cccccc    1      3    C

Upvotes: 2

Related Questions