Replace with first occurrence value for duplicate columns using pandas or python

Question

I have data like

ca ca ca 120.00

ca cc cd 130.00

ca ca ca 135.23

ca ha ca 60.00

ca ha ca 50.00

If first 3 columns are equal then fourth column value should be the first occurrence. I want data like

ca ca ca 120.00

ca cc cd 130.00

ca ca ca 120.00

ca ha ca 60.00

ca ha ca 60.00

Please help me to solve this

jezrael · Accepted Answer

Use GroupBy.transform with GroupBy.first

Dynamic solution with selecting first 3 columns to list and processing 4th column assigned back:

df.iloc[:, 3] = df.groupby(df.columns[:3].tolist())[df.columns[3]].transform('first')
print (df)
    0   1   2      3
0  ca  ca  ca  120.0
1  ca  cc  cd  130.0
2  ca  ca  ca  120.0
3  ca  ha  ca   60.0
4  ca  ha  ca   60.0

If there are 4 columns names like a,b,c,d solution is simplier:

df['d'] = df.groupby(['a','b','c'])['d'].transform('first')
print (df)
    a   b   c      d
0  ca  ca  ca  120.0
1  ca  cc  cd  130.0
2  ca  ca  ca  120.0
3  ca  ha  ca   60.0
4  ca  ha  ca   60.0

Replace with first occurrence value for duplicate columns using pandas or python

Answers (2)

Related Questions