Merge columns with NaN

Question

I am trying to clean a very large data frame in Pandas.

I have columns labeled currentAge, currentAge2, sex, sex2, height, height2, weight, weight2. For some rows there is a value for currentAge and for other rows there is NaN for currentAge2. It can also go the other way where for some rows there is a value for currentAge2 and NaN for currentAge. This is true for the other metrics sex, weight, and height.

I want to merge the common metrics together like currentAge and currentAge2 so that there is one column currentAge with no NaNs.

How would I go about doing this?

Ben.T · Accepted Answer

You need to use fillna. Let's consider this dataframe were some age are in column currentAge and other in currentAge2:

df = pd.DataFrame({'currentAge':[np.nan, 12, 15, 16, np.nan], 
                   'currentAge2':[8, np.nan, np.nan, np.nan, 24]})
   currentAge  currentAge2
0         NaN          8.0
1        12.0          NaN
2        15.0          NaN
3        16.0          NaN
4         NaN         24.0

what you can do is then: df['currentAge'] = df['currentAge'].fillna(df['currentAge2']) to fill all nan in the first column by the values from the second, and then you get:

   currentAge  currentAge2
0         8.0          8.0
1        12.0          NaN
2        15.0          NaN
3        16.0          NaN
4        24.0         24.0

To remove the column currentAge2, then do df = df.drop('currentAge2',1) and you will have a simple column currentAge filled.

Merge columns with NaN

Answers (1)

Related Questions