Archit
Archit

Reputation: 588

Taking the first records for each group in pandas dataframe and putting 0 in other records

I have a pandas dataframe df:

s = {'id': [243,243, 243, 243,443,443,443],
 'st': [1,3,5,9,2,6,7],
 'value':[2.4, 3.8, 3.7, 5.6, 1.2, 0.2, 2.1]}
df = pd.DataFrame(s)

which looks like this:

    id  st  value
0  243   1    2.4
1  243   3    3.8
2  243   5    3.7
3  243   9    5.6
4  443   2    1.2
5  443   6    0.2
6  443   7    2.1

I want to put 0 as value for all the records except of the first records for each id . My expected output is:

    id  st  value
0  243   1    2.4
1  243   3    0
2  243   5    0
3  243   9    0
4  443   2    1.2
5  443   6    0
6  443   7    0

How can I do this with a pandas dataframe?

Upvotes: 5

Views: 71

Answers (3)

anky
anky

Reputation: 75100

Another way of doing this is:

df.loc[df.id.eq(df.id.shift()),'value']=0
print(df)

    id  st  value
0  243   1    2.4
1  243   3    0.0
2  243   5    0.0
3  243   9    0.0
4  443   2    1.2
5  443   6    0.0
6  443   7    0.0

Upvotes: 2

yatu
yatu

Reputation: 88276

Here's one way checking for duplicates in id and multiplying the boolean result by value:

df['value'] = (~df.id.duplicated('first')).mul(df.value)

    id  st  value
0  243   1    2.4
1  243   3    0.0
2  243   5    0.0
3  243   9    0.0
4  443   2    1.2
5  443   6    0.0
6  443   7    0.0

Upvotes: 2

jezrael
jezrael

Reputation: 863166

Use Series.duplicated for set values by mask with DataFrame.loc or numpy.where:

df.loc[df['id'].duplicated(), 'value'] = 0

Or:

df['value'] = np.where(df['id'].duplicated(), 0, df['value'])
print (df)
    id  st  value
0  243   1    2.4
1  243   3    0.0
2  243   5    0.0
3  243   9    0.0
4  443   2    1.2
5  443   6    0.0
6  443   7    0.0

Upvotes: 1

Related Questions