Reputation: 297
I need to remove columns that have duplicated names, but only remove the duplicate that has all row values as 'nan' (as a string, not NaN). The example data and expected output are below. Thanks so much!
df = pd.DataFrame({'id':[1,2,3,4],'a':[0,0,0,'nan'], 'b':['nan','nan','nan','nan'], 'c':['nan','nan','nan','nan'], 'd':[1,'nan',0,2]})
df = df.rename(columns = {'a':'a','b':'a', 'c':'b', 'd':'b'})
id a a b b
0 1 0 nan nan 1
1 2 0 nan nan nan
2 3 0 nan nan 0
3 4 nan nan nan 2
expected output
id a b
0 1 0 1
1 2 0 nan
2 3 0 0
3 4 nan 2
Upvotes: 1
Views: 234
Reputation: 9197
The question is, if you want to keep columns which are empty, but unique. If not you can solve it in a single line: df1 = df.dropna(how='all', axis='columns')
If you want to keep them you can drop them and then re-create them because they are essentially just empty after all.
Input:
df = pd.DataFrame({'id':[1,2,3,4],'a':[0,0,0,np.nan], 'b':[np.nan,np.nan,np.nan,np.nan], 'c':[np.nan,np.nan,np.nan,np.nan], 'd':[1,np.nan,0,2], 'e':[np.nan,np.nan,np.nan,np.nan]})
df = df.rename(columns = {'a':'a','b':'a', 'c':'b', 'd':'b'})
Code:
df1 = df.dropna(how='all', axis='columns').copy()
df1[list(df.columns.difference(df1.columns))] = np.nan
Out:
# id a b e
# 0 1 0.0 1.0 NaN
# 1 2 0.0 NaN NaN
# 2 3 0.0 0.0 NaN
# 3 4 NaN 2.0 NaN
Upvotes: 2