Reputation: 77
I have a df with product ID, for each product I have some information, I need to keep only the first information !=0 within the group, all the others need to be set to zero. I am working with pandas.
This is the input:
ID | Info |
---|---|
AA | 0 |
AA | 20 |
AA | 30 |
BB | 10 |
BB | 0 |
BB | 20 |
This is the output i would like:
ID | Info |
---|---|
AA | 0 |
AA | 20 |
AA | 0 |
BB | 10 |
BB | 0 |
BB | 0 |
thank you for your help
Upvotes: 3
Views: 240
Reputation: 71687
We can compare the Info
column with 0
to create a boolean mask, then check for the duplicated
values in this mask per unique ID
and update the values in column Info
with 0
corresponding to duplicated values
df.loc[df.assign(m=df['Info'].ne(0)).duplicated(['ID', 'm']), 'Info'] = 0
ID Info
0 AA 0
1 AA 20
2 AA 0
3 BB 10
4 BB 0
5 BB 0
Upvotes: 4
Reputation: 59579
Check the condition the use groupby
+ idxmax
to find the row where it first occurs for each group. Use where
to turn everything but those rows to 0 and fill the rest with 0.
Because the condition check uses the same value as the fill value, it doesn't matter that we mask all of the 0s with the where
(or that idxmax will pick the first index even for groups with all 0s).
For this to work properly you must have a non-duplicated Index.
ids = df['Info'].ne(0).groupby(df['ID']).idxmax()
df['Info'] = df['Info'].where(df.index.isin(ids)).fillna(0, downcast='infer')
ID Info
0 AA 0
1 AA 20
2 AA 0
3 BB 10
4 BB 0
5 BB 0
Upvotes: 3
Reputation: 195543
df["Info"] = df.groupby("ID")["Info"].transform(
lambda x: x * (x.index == x.gt(0).idxmax())
)
print(df)
Prints:
ID Info
0 AA 0
1 AA 20
2 AA 0
3 BB 10
4 BB 0
5 BB 0
Upvotes: 0
Reputation: 75100
You can try a idxmax
after filtering for !=0 on the info column, then mask using where
s = df.loc[df['Info'].ne(0).groupby(df['ID']).transform('idxmax'),'Info']
s.index = df.index
df['New_Info'] = s.where(df['Info'].eq(s),0)
print(df)
ID Info New_Info
0 AA 0 0
1 AA 20 20
2 AA 30 0
3 BB 10 10
4 BB 0 0
5 BB 20 0
Upvotes: 2