user3222101
user3222101

Reputation: 1330

how to use group by in filter condition in pandas

I have below data stored in a dataframe and I want to remove the rows that has id equal to finalid and for the same id, i have multiple rows.

example: df_target

id finalid month year count_ph count_sh
1   1      1     2012  12        20
1   2      1     2012   6         18
1   32     1     2012   6         2
2   2      1     2012   2         6
2   23     1     2012   2         6
3   3      1     2012   2         2

output

id finalid month year count_ph count_sh

    1   2    1     2012    6         18
    1   32   1     2012    6         2
    2   23   1     2012    2         6
    3   3    1     2012    2         2

functionality is something like: remove records and get the final dataframe (df_target.groupby(['id','month','year']).size() > 1) & (df_target['id'] == df_target['finalid'])

Upvotes: 2

Views: 42

Answers (1)

jezrael
jezrael

Reputation: 863031

I think need transform for same Series as origonal DataFrame and ~ for invert final boolean mask:

df = df_target[~((df_target.groupby(['id','month','year'])['id'].transform('size') > 1) & 
                 (df_target['id'] == df_target['finalid']))]

Alternative solution:

df = df_target[((df_target.groupby(['id','month','year'])['id'].transform('size') <= 1) | 
                 (df_target['id'] != df_target['finalid']))]

print (df)
   id  finalid  month  year  count_ph  count_sh
1   1        2      1  2012         6        18
2   1       32      1  2012         6         2
4   2       23      1  2012         2         6
5   3        3      1  2012         2         2

Upvotes: 1

Related Questions