Reputation: 341
Sorry if this has been asked before, could not find an exact question.
I am looking for the most efficient way in Pandas to do the following operation:
Lets say we have the following table:
ID SUB_ID COND
1 101 1 1
2 101 2 1
3 101 3 1
4 102 1 1
5 102 2 0
6 103 1 0
7 103 2 0
8 103 3 0
9 103 4 0
Basically, for each "ID" we want to get the largest "SUB_ID", given that the COND is 1. Ideally we would want to add this value to each row of that ID as a new column. If no row of that ID fulfills the condition, we would like to add a 0 (instead of null)
Resulting dataframe would be:
ID SUB_ID COND MAX_SUB_ID
1 101 1 1 3
2 101 2 1 3
3 101 3 1 3
4 102 1 1 1
5 102 2 0 1
6 103 1 0 0
7 103 2 0 0
8 103 3 0 0
9 103 4 0 0
Best way I can come up with right now is selecting only the rows where COND=1, then doing a groupby on this dataframe to get the max sub id, and then joining it back to the main dataframe. After this I can change the null back to 0.
df_true = df[df['COND']==1]
max_subid_true=df_true['SUB_ID'].groupby(df_true['ID']).max()
joined_df = df.merge(pd.DataFrame(max_subid_true),how='left',left_on='ID',right_index=True)
joined_df.loc[pd.isnull(joined_df['SUB_ID_y']),'SUB_ID_y']=0
Any ideas on doing this differently?
Upvotes: 3
Views: 1066
Reputation: 294586
df.assign(MAX_SUB_ID=df.SUB_ID.mul(df.COND).groupby(df.ID).transform('max'))
ID SUB_ID COND MAX_SUB_ID
1 101 1 1 3
2 101 2 1 3
3 101 3 1 3
4 102 1 1 1
5 102 2 0 1
6 103 1 0 0
7 103 2 0 0
8 103 3 0 0
9 103 4 0 0
caveats
SUB_ID
is always positiveCOND
is always 1
or 0
alternative (with less caveats)
but less fun
df.assign(MAX_SUB_ID=df.ID.map(df.query('COND == 1').groupby('ID').SUB_ID.max()) \
.fillna(0).astype(int))
Upvotes: 2