Reputation: 2243
Given a dataframe with similar columns having null values in between. How to dynamically fill nulls in the columns with non-null values from other columns without explicitly stating the names of other column names e.g. select first column category1
and fill the null rows with values from other columns of same rows?
data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019],
'category1': [None, 21, None, 10, None, 30, 31,45, 23, 56],
'category2': [10, 21, 20, 10, None, 30, None,45, 23, 56],
'category3': [10, 21, 20, 10, None, 30, 31,45, 23, 56],}
df = pd.DataFrame(data)
df = df.set_index('year')
df
category1 category2 category3
year
2010 NaN 10 10
2011 21 21 21
2012 NaN 20 20
2013 10 10 10
2014 NaN NaN NaN
2015 30 30 NaN
2016 31 NaN 31
2017 45 45 45
2018 23 23 23
2019 56 56 56
After filling category1
:
category1 category2 category3
year
2010 10 10 10
2011 21 21 21
2012 20 20 20
2013 10 10 10
2014 NaN NaN NaN
2015 30 30 NaN
2016 31 NaN 31
2017 45 45 45
2018 23 23 23
2019 56 56 56
Upvotes: 3
Views: 1405
Reputation: 25629
Try This:
df['category1']= df['category1'].fillna(df.median(axis=1))
Upvotes: 0
Reputation: 862511
You can use first_valid_index
with condition if all values are NaN
:
def f(x):
if x.first_valid_index() is None:
return None
else:
return x[x.first_valid_index()]
df['a'] = df.apply(f, axis=1)
print (df)
category1 category2 category3 a
year
2010 NaN 10.0 10.0 10.0
2011 21.0 21.0 21.0 21.0
2012 NaN 20.0 20.0 20.0
2013 10.0 10.0 10.0 10.0
2014 NaN NaN NaN NaN
2015 30.0 30.0 30.0 30.0
2016 31.0 NaN 31.0 31.0
2017 45.0 45.0 45.0 45.0
2018 23.0 23.0 23.0 23.0
2019 56.0 56.0 56.0 56.0
Upvotes: 1
Reputation: 210832
IIUC you can do it this way:
In [369]: df['category1'] = df['category1'].fillna(df['category2'])
In [370]: df
Out[370]:
category1 category2 category3
year
2010 10.0 10.0 10.0
2011 21.0 21.0 21.0
2012 20.0 20.0 20.0
2013 10.0 10.0 10.0
2014 NaN NaN NaN
2015 30.0 30.0 30.0
2016 31.0 NaN 31.0
2017 45.0 45.0 45.0
2018 23.0 23.0 23.0
2019 56.0 56.0 56.0
Upvotes: 1