Reputation: 14119
I have a DataFrame with lots of columns, and I want to remove rows where the values for some columns are null. I know how to do this with one column:
df = df[df['Column'] != '']
I want to do this with a set of columns, like so:
df = df['' not in [df['Column1'], df['Column2'], df['Column3']]'
However, this gives the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How do I do this?
Upvotes: 2
Views: 184
Reputation: 863056
If values are empty strings create subset and for all True
s per row add all
or any
:
df = df[(df[['Column1', 'Column2', 'Column1']] != '').all(axis=1)]
df = df[~(df[['Column1', 'Column2', 'Column1']] == '').any(axis=1)]
And if values are NaN
s, None
s use dropna
with parameeter subset
:
df = df.dropna(subset=['Column1', 'Column2', 'Column1'])
Sample:
df = pd.DataFrame({'A':[np.nan,'','p','hh','f'],
'B':['',np.nan,'','','o'],
'C':['a','s','d','f','g'],
'D':['f','g','h','j','k'],
'E':['l','i',np.nan,'u','o'],
'F':['','','o','i',np.nan]})
print (df)
A B C D E F
0 NaN a f l
1 NaN s g i
2 p d h NaN o
3 hh f j u i
4 f o g k o NaN
df1 = df.dropna(subset=['A', 'B', 'F'])
print (df1)
A B C D E F
2 p d h NaN o
3 hh f j u i
df2 = df[(df[['A', 'B', 'F']] != '').all(axis=1)]
print (df2)
A B C D E F
4 f o g k o NaN
df2 = df[~(df[['A', 'B', 'F']] == '').any(axis=1)]
print (df2)
A B C D E F
4 f o g k o NaN
EDIT:
For comparing strings and some column is numeric get:
TypeError: Could not compare [''] with block values
There are 2 solutions for it - compare numpy array created by values
or convert values to string
s by astype
:
df = pd.DataFrame({'A':[np.nan,7,8,8,8],
'B':['',np.nan,'','','o'],
'C':['a','s','d','f','g'],
'D':['f','g','h','j','k'],
'E':['l','i',np.nan,'u','o'],
'F':['','','o','i',np.nan]})
print (df)
A B C D E F
0 NaN a f l
1 7.0 NaN s g i
2 8.0 d h NaN o
3 8.0 f j u i
4 8.0 o g k o NaN
df2 = df[(df[['A', 'B', 'F']].values != '').all(axis=1)]
print (df2)
A B C D E F
4 8.0 o g k o NaN
df2 = df[(df[['A', 'B', 'F']].astype(str) != '').all(axis=1)]
print (df2)
A B C D E F
4 8.0 o g k o NaN
Upvotes: 3