Change values from dataframe if two columns are the same

Question

I'm cleaning up a dataframe to train a machine learning model and I found that some entries have two different values in one column. For example:

A	B
1234	foo
1234	bar

Since the value in column A is 1234 for both entries, the value in column B should be foo (or bar) in both cases.

I tried a brute force approach to this:

for index1, row1 in df.iterrows():
    for index2, row2 in df.iterrows():
        if (row1['A'] == row2['A']) and ((row1['B'] != row2['B'])):
            print(f'Found duplicated A with different B!')
            row1['B'] == row2['B']
            row1['C'] == row2['C'] == False

But probably there is an easier way to do this that I can't see. Does pandas have any methods to deal with this?

Quang Hoang · Accepted Answer

You can use groupby.transform('first') (or 'last'):

df['B'] = df.groupby('A')['B'].transform('first')

Change values from dataframe if two columns are the same

Answers (1)

Related Questions