Pete Populii
Pete Populii

Reputation: 63

Fill missing value based on value from another column in the same row

I have a DataFrame looks like this

ColA | ColB | ColC | ColD |
-----|------|------|------|
100  |   A  |  X1  |  NaN |
200  |   B  |  X2  |  AAA |
300  |   C  |  X3  |  NaN |

I want to fill the missing value on ColD based on value on ColA. The result I need is like:

if value in ColA = 100 then value in ColD = "BBB"
if value in ColA = 300 then value in ColD = "CCC"

ColA | ColB | ColC | ColD |
-----|------|------|------|
100  |   A  |  X1  |  BBB |
200  |   B  |  X2  |  AAA |
300  |   C  |  X3  |  CCC |

Upvotes: 6

Views: 6538

Answers (2)

Prakriti Gupta
Prakriti Gupta

Reputation: 111

Define a mapping function:

def my_map_func(x):
    return "BBB" if x==100 else "CCC"

Right now, df looks like:

ColA | ColB | ColC | ColD
-----|------|------|-----
100  |    A |   X1 |  NaN
200  |    B |   X2 |  AAA
300  |    C |   X3 |  NaN

Select the rows that have NaN, and fill it with mapped value obtained from column ColA

df.ix[df.ColD.isnull(), 'ColD'] = df.ix[df.ColD.isnull(), 'ColA'].apply(my_map_func)

Here, we are basically selecting only those rows for which ColD is NaN by indexing based on a boolean series and selecting the column, ColA we are interested in. In simple language, df.ix[selected_rows, selected_columns].

Now, dataframe df looks like:

ColA | ColB | ColC | ColD
-----|------|------|-----
100  |    A |   X1 |  BBB
200  |    B |   X2 |  AAA
300  |    C |   X3 |  CCC

Upvotes: 1

jezrael
jezrael

Reputation: 863531

You can use combine_first or fillna:

df.ColD = df.ColD.combine_first(df.ColA)
print (df)
   ColA ColB ColC ColD
0   100    A   X1  100
1   200    B   X2  AAA
2   300    C   X3  300

Or:

df.ColD = df.ColD.fillna(df.ColA)
print (df)
   ColA ColB ColC ColD
0   100    A   X1  100
1   200    B   X2  AAA
2   300    C   X3  300

EDIT: First use map for Series s and then combine_first or fillna by this Series:

d = {100: "BBB", 300:'CCC'}
s = df.ColA.map(d)
print (s)
0    BBB
1    NaN
2    CCC
Name: ColA, dtype: object

df.ColD = df.ColD.combine_first(s)
print (df)
   ColA ColB ColC ColD
0   100    A   X1  BBB
1   200    B   X2  AAA
2   300    C   X3  CCC

It replace only NaN:

print (df)
   ColA ColB ColC ColD
0   100    A   X1  EEE <- changed value to EEE
1   200    B   X2  AAA
2   300    C   X3  NaN

d = {100: "BBB", 300:'CCC'}
s = df.ColA.map(d)
df.ColD = df.ColD.combine_first(s)
print (df)
   ColA ColB ColC ColD
0   100    A   X1  EEE
1   200    B   X2  AAA
2   300    C   X3  CCC

Upvotes: 6

Related Questions