Macterror
Macterror

Reputation: 431

Changing the value of a pandas dataframe column based on duplicates

Let's say I have a pandas dataframe set up in the following way:

col1|  col2 | col3

1       A      10

1       A      10

3       B      12

Is there a way to set the value of col3 to 0 for any instance of col2 after the first that appears again? I am looking to output the following result:

col1|  col2 | col3

1       A      10

1       A      0

3       B      12

I apologize for the confusing question, it was the best way I could describe it!

Upvotes: 3

Views: 46

Answers (2)

zipa
zipa

Reputation: 27869

You can use np.where:

import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': [1, 1, 3],
                   'col2': ['A', 'A', 'B'],
                   'col3': [10, 10, 12]})

df['col3'] = np.where(df['col2'].duplicated(), 0, df['col3'])

df

   col1 col2  col3
0     1    A    10
1     1    A     0
2     3    B    12

Upvotes: 1

yatu
yatu

Reputation: 88226

You can use DataFrame.duplicated:

df.loc[df.duplicated(subset='col2'), 'col3'] = 0

    col1 col2  col3
0     1    A    10
1     1    A     0
2     3    B    12

Upvotes: 2

Related Questions