sebap123
sebap123

Reputation: 2685

Pandas - replacing NaN values using values from other columns

I've started to work with pandas recently and I am trying to do some simple cleaning in data. So far I've learned that there are a lot of one-liners when using pandas. So I am looking also for one in my case. I have DataFrame like this:

    C0          C1    C2         C3        C4
    3789507     2010   NaN        NaN        NaN  
    3789508      NaN   NaN       2010        NaN  
    3789509      NaN   NaN       2016        NaN  
    3789510      NaN  2014        NaN        NaN
    3789511      NaN   NaN        NaN      2014

I want to end up with something like this:

C0          C1
3789507     2010
3789508     2010
3789509     2016
3789510     2014
3789511     2014

What I am doing now is simply:

k = df.C1.isnull()
df.C1[k] = df.C2[k]
k = df.C1.isnull()
df.C1[k] = df.C3[k]
k = df.C1.isnull()
df.C1[k] = df.C4[k]

It is working, but is not the nicest solution. What if I'll ave 100 of columns? Is loop here only solution?

Just in case my for loop looks like this:

for i in range(2,len(df.columns)):
    k = df.C1.isnull()
    df.C1[k] = df.ix[:,i]

Upvotes: 0

Views: 49

Answers (1)

Marius
Marius

Reputation: 60060

Pandas has some methods to backfill and forwardfill missing values, so you can do:

df['C1'] = df.bfill(axis='columns')['C1']

df
Out[10]: 
        C0      C1      C2      C3      C4
0  3789507  2010.0     NaN     NaN     NaN
1  3789508  2010.0     NaN  2010.0     NaN
2  3789509  2016.0     NaN  2016.0     NaN
3  3789510  2014.0  2014.0     NaN     NaN
4  3789511  2014.0     NaN     NaN  2014.0

This works for your simple example data but in a real dataset you may have to restrict the columns that are being backfilled over like:

fill_cols = ['C1', 'C2', 'C3', 'C4']
df['C1'] = df[fill_cols].bfill(axis='columns')['C1']

Upvotes: 1

Related Questions