Sissi
Sissi

Reputation: 77

For each ID in a group keep only the first value !=0 within the group, set other to 0

I have a df with product ID, for each product I have some information, I need to keep only the first information !=0 within the group, all the others need to be set to zero. I am working with pandas.

This is the input:

ID Info
AA 0
AA 20
AA 30
BB 10
BB 0
BB 20

This is the output i would like:

ID Info
AA 0
AA 20
AA 0
BB 10
BB 0
BB 0

thank you for your help

Upvotes: 3

Views: 240

Answers (4)

Shubham Sharma
Shubham Sharma

Reputation: 71687

We can compare the Info column with 0 to create a boolean mask, then check for the duplicated values in this mask per unique ID and update the values in column Info with 0 corresponding to duplicated values

df.loc[df.assign(m=df['Info'].ne(0)).duplicated(['ID', 'm']), 'Info'] = 0

   ID  Info
0  AA     0
1  AA    20
2  AA     0
3  BB    10
4  BB     0
5  BB     0

Upvotes: 4

ALollz
ALollz

Reputation: 59579

Check the condition the use groupby + idxmax to find the row where it first occurs for each group. Use where to turn everything but those rows to 0 and fill the rest with 0.

Because the condition check uses the same value as the fill value, it doesn't matter that we mask all of the 0s with the where (or that idxmax will pick the first index even for groups with all 0s).

For this to work properly you must have a non-duplicated Index.

ids = df['Info'].ne(0).groupby(df['ID']).idxmax()
df['Info'] = df['Info'].where(df.index.isin(ids)).fillna(0, downcast='infer')

   ID  Info
0  AA     0
1  AA    20
2  AA     0
3  BB    10
4  BB     0
5  BB     0

Upvotes: 3

Andrej Kesely
Andrej Kesely

Reputation: 195543

df["Info"] = df.groupby("ID")["Info"].transform(
    lambda x: x * (x.index == x.gt(0).idxmax())
)

print(df)

Prints:

   ID  Info
0  AA     0
1  AA    20
2  AA     0
3  BB    10
4  BB     0
5  BB     0

Upvotes: 0

anky
anky

Reputation: 75100

You can try a idxmax after filtering for !=0 on the info column, then mask using where

s = df.loc[df['Info'].ne(0).groupby(df['ID']).transform('idxmax'),'Info']
s.index = df.index
df['New_Info'] = s.where(df['Info'].eq(s),0)

print(df)

   ID  Info  New_Info
0  AA     0         0
1  AA    20        20
2  AA    30         0
3  BB    10        10
4  BB     0         0
5  BB    20         0

Upvotes: 2

Related Questions