Reputation: 63
I have a DataFrame looks like this
ColA | ColB | ColC | ColD |
-----|------|------|------|
100 | A | X1 | NaN |
200 | B | X2 | AAA |
300 | C | X3 | NaN |
I want to fill the missing value on ColD based on value on ColA. The result I need is like:
if value in ColA = 100 then value in ColD = "BBB"
if value in ColA = 300 then value in ColD = "CCC"
ColA | ColB | ColC | ColD |
-----|------|------|------|
100 | A | X1 | BBB |
200 | B | X2 | AAA |
300 | C | X3 | CCC |
Upvotes: 6
Views: 6538
Reputation: 111
Define a mapping function:
def my_map_func(x):
return "BBB" if x==100 else "CCC"
Right now, df looks like:
ColA | ColB | ColC | ColD
-----|------|------|-----
100 | A | X1 | NaN
200 | B | X2 | AAA
300 | C | X3 | NaN
Select the rows that have NaN, and fill it with mapped value obtained from column ColA
df.ix[df.ColD.isnull(), 'ColD'] = df.ix[df.ColD.isnull(), 'ColA'].apply(my_map_func)
Here, we are basically selecting only those rows for which ColD is NaN by indexing based on a boolean series and selecting the column, ColA we are interested in. In simple language, df.ix[selected_rows, selected_columns].
Now, dataframe df looks like:
ColA | ColB | ColC | ColD
-----|------|------|-----
100 | A | X1 | BBB
200 | B | X2 | AAA
300 | C | X3 | CCC
Upvotes: 1
Reputation: 863531
You can use combine_first
or fillna
:
df.ColD = df.ColD.combine_first(df.ColA)
print (df)
ColA ColB ColC ColD
0 100 A X1 100
1 200 B X2 AAA
2 300 C X3 300
Or:
df.ColD = df.ColD.fillna(df.ColA)
print (df)
ColA ColB ColC ColD
0 100 A X1 100
1 200 B X2 AAA
2 300 C X3 300
EDIT: First use map
for Series
s
and then combine_first
or fillna
by this Series
:
d = {100: "BBB", 300:'CCC'}
s = df.ColA.map(d)
print (s)
0 BBB
1 NaN
2 CCC
Name: ColA, dtype: object
df.ColD = df.ColD.combine_first(s)
print (df)
ColA ColB ColC ColD
0 100 A X1 BBB
1 200 B X2 AAA
2 300 C X3 CCC
It replace only NaN
:
print (df)
ColA ColB ColC ColD
0 100 A X1 EEE <- changed value to EEE
1 200 B X2 AAA
2 300 C X3 NaN
d = {100: "BBB", 300:'CCC'}
s = df.ColA.map(d)
df.ColD = df.ColD.combine_first(s)
print (df)
ColA ColB ColC ColD
0 100 A X1 EEE
1 200 B X2 AAA
2 300 C X3 CCC
Upvotes: 6