Marcos Santana
Marcos Santana

Reputation: 951

Change values from dataframe if two columns are the same

I'm cleaning up a dataframe to train a machine learning model and I found that some entries have two different values in one column. For example:

A B
1234 foo
1234 bar

Since the value in column A is 1234 for both entries, the value in column B should be foo (or bar) in both cases.

I tried a brute force approach to this:

for index1, row1 in df.iterrows():
    for index2, row2 in df.iterrows():
        if (row1['A'] == row2['A']) and ((row1['B'] != row2['B'])):
            print(f'Found duplicated A with different B!')
            row1['B'] == row2['B']
            row1['C'] == row2['C'] == False

But probably there is an easier way to do this that I can't see. Does pandas have any methods to deal with this?

Upvotes: 0

Views: 84

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

You can use groupby.transform('first') (or 'last'):

df['B'] = df.groupby('A')['B'].transform('first')

Upvotes: 2

Related Questions