Reputation: 19
If I use the code below, it will keep the column that has NaNs (please see the attached pic). I have other columns that are similar. Is it possible to keep the second one instead of the first one?
data_final2 = data_final.loc[:, ~data_final.columns.duplicated()]
Upvotes: 0
Views: 52
Reputation: 59519
groupby
the columns and choose the first
value, which will ignore Nulls.
df.groupby(df.columns, 1).first()
import pandas as pd
import numpy as np
df = pd.DataFrame({'0': [1,2,3], '1': [np.NaN]*3, '2': [np.NaN]*3, '3': ['1x1', '2x2', '3x3']})
df.columns= ['Size', 'Size', 'Dims', 'Dims']
# Size Size Dims Dims
#0 1 NaN NaN 1x1
#1 2 NaN NaN 2x2
#2 3 NaN NaN 3x3
df.groupby(df.columns, 1).first()
# Dims Size
#0 1x1 1
#1 2x2 2
#2 3x3 3
Upvotes: 0
Reputation: 11105
NaN
If you only need a fix for this specific case, and you know that your desired column does not have NaN
s:
data_final2 = data_final.dropna(axis=1)
data_final.columns = ['Site_nan', 'Site', 'Dimensions_nan', 'Dimensions']
data_final2 = data_final[['Site', 'Dimensions']].copy()
Upvotes: 2