Reputation: 588
I have a pandas dataframe df
:
s = {'id': [243,243, 243, 243,443,443,443],
'st': [1,3,5,9,2,6,7],
'value':[2.4, 3.8, 3.7, 5.6, 1.2, 0.2, 2.1]}
df = pd.DataFrame(s)
which looks like this:
id st value
0 243 1 2.4
1 243 3 3.8
2 243 5 3.7
3 243 9 5.6
4 443 2 1.2
5 443 6 0.2
6 443 7 2.1
I want to put 0 as value
for all the records except of the first records for each id
. My expected output is:
id st value
0 243 1 2.4
1 243 3 0
2 243 5 0
3 243 9 0
4 443 2 1.2
5 443 6 0
6 443 7 0
How can I do this with a pandas dataframe?
Upvotes: 5
Views: 71
Reputation: 75100
Another way of doing this is:
df.loc[df.id.eq(df.id.shift()),'value']=0
print(df)
id st value
0 243 1 2.4
1 243 3 0.0
2 243 5 0.0
3 243 9 0.0
4 443 2 1.2
5 443 6 0.0
6 443 7 0.0
Upvotes: 2
Reputation: 88276
Here's one way checking for duplicates in id
and multiplying the boolean
result by value
:
df['value'] = (~df.id.duplicated('first')).mul(df.value)
id st value
0 243 1 2.4
1 243 3 0.0
2 243 5 0.0
3 243 9 0.0
4 443 2 1.2
5 443 6 0.0
6 443 7 0.0
Upvotes: 2
Reputation: 863166
Use Series.duplicated
for set values by mask with DataFrame.loc
or numpy.where
:
df.loc[df['id'].duplicated(), 'value'] = 0
Or:
df['value'] = np.where(df['id'].duplicated(), 0, df['value'])
print (df)
id st value
0 243 1 2.4
1 243 3 0.0
2 243 5 0.0
3 243 9 0.0
4 443 2 1.2
5 443 6 0.0
6 443 7 0.0
Upvotes: 1