Reputation: 125
I am trying to clean a very large data frame in Pandas.
I have columns labeled currentAge, currentAge2, sex, sex2, height, height2, weight, weight2. For some rows there is a value for currentAge and for other rows there is NaN for currentAge2. It can also go the other way where for some rows there is a value for currentAge2 and NaN for currentAge. This is true for the other metrics sex, weight, and height.
I want to merge the common metrics together like currentAge and currentAge2 so that there is one column currentAge with no NaNs.
How would I go about doing this?
Upvotes: 0
Views: 67
Reputation: 29635
You need to use fillna
. Let's consider this dataframe were some age are in column currentAge and other in currentAge2:
df = pd.DataFrame({'currentAge':[np.nan, 12, 15, 16, np.nan],
'currentAge2':[8, np.nan, np.nan, np.nan, 24]})
currentAge currentAge2
0 NaN 8.0
1 12.0 NaN
2 15.0 NaN
3 16.0 NaN
4 NaN 24.0
what you can do is then: df['currentAge'] = df['currentAge'].fillna(df['currentAge2'])
to fill all nan
in the first column by the values from the second, and then you get:
currentAge currentAge2
0 8.0 8.0
1 12.0 NaN
2 15.0 NaN
3 16.0 NaN
4 24.0 24.0
To remove the column currentAge2, then do df = df.drop('currentAge2',1)
and you will have a simple column currentAge filled.
Upvotes: 2