Reputation: 2685
I've started to work with pandas recently and I am trying to do some simple cleaning in data. So far I've learned that there are a lot of one-liners when using pandas. So I am looking also for one in my case. I have DataFrame
like this:
C0 C1 C2 C3 C4
3789507 2010 NaN NaN NaN
3789508 NaN NaN 2010 NaN
3789509 NaN NaN 2016 NaN
3789510 NaN 2014 NaN NaN
3789511 NaN NaN NaN 2014
I want to end up with something like this:
C0 C1
3789507 2010
3789508 2010
3789509 2016
3789510 2014
3789511 2014
What I am doing now is simply:
k = df.C1.isnull()
df.C1[k] = df.C2[k]
k = df.C1.isnull()
df.C1[k] = df.C3[k]
k = df.C1.isnull()
df.C1[k] = df.C4[k]
It is working, but is not the nicest solution. What if I'll ave 100 of columns? Is loop here only solution?
Just in case my for loop looks like this:
for i in range(2,len(df.columns)):
k = df.C1.isnull()
df.C1[k] = df.ix[:,i]
Upvotes: 0
Views: 49
Reputation: 60060
Pandas has some methods to backfill and forwardfill missing values, so you can do:
df['C1'] = df.bfill(axis='columns')['C1']
df
Out[10]:
C0 C1 C2 C3 C4
0 3789507 2010.0 NaN NaN NaN
1 3789508 2010.0 NaN 2010.0 NaN
2 3789509 2016.0 NaN 2016.0 NaN
3 3789510 2014.0 2014.0 NaN NaN
4 3789511 2014.0 NaN NaN 2014.0
This works for your simple example data but in a real dataset you may have to restrict the columns that are being backfilled over like:
fill_cols = ['C1', 'C2', 'C3', 'C4']
df['C1'] = df[fill_cols].bfill(axis='columns')['C1']
Upvotes: 1