Remove rows if any of a set of values are null

Question

I have a DataFrame with lots of columns, and I want to remove rows where the values for some columns are null. I know how to do this with one column:

df = df[df['Column'] != '']

I want to do this with a set of columns, like so:

df = df['' not in [df['Column1'], df['Column2'], df['Column3']]'

However, this gives the error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How do I do this?

jezrael · Accepted Answer

If values are empty strings create subset and for all Trues per row add all or any:

df = df[(df[['Column1', 'Column2', 'Column1']] != '').all(axis=1)]

df = df[~(df[['Column1', 'Column2', 'Column1']] == '').any(axis=1)]

And if values are NaNs, Nones use dropna with parameeter subset:

df = df.dropna(subset=['Column1', 'Column2', 'Column1'])

Sample:

df = pd.DataFrame({'A':[np.nan,'','p','hh','f'],
                   'B':['',np.nan,'','','o'],
                   'C':['a','s','d','f','g'],
                   'D':['f','g','h','j','k'],
                   'E':['l','i',np.nan,'u','o'],
                   'F':['','','o','i',np.nan]})

print (df)
     A    B  C  D    E    F
0  NaN       a  f    l     
1       NaN  s  g    i     
2    p       d  h  NaN    o
3   hh       f  j    u    i
4    f    o  g  k    o  NaN

df1 = df.dropna(subset=['A', 'B', 'F'])
print (df1)
   A B  C  D    E  F
2   p    d  h  NaN  o
3  hh    f  j    u  i

df2 = df[(df[['A', 'B', 'F']] != '').all(axis=1)]
print (df2)
   A  B  C  D  E    F
4  f  o  g  k  o  NaN

df2 = df[~(df[['A', 'B', 'F']] == '').any(axis=1)]
print (df2)
   A  B  C  D  E    F
4  f  o  g  k  o  NaN

EDIT:

For comparing strings and some column is numeric get:

TypeError: Could not compare [''] with block values

There are 2 solutions for it - compare numpy array created by values or convert values to strings by astype:

df = pd.DataFrame({'A':[np.nan,7,8,8,8],
                   'B':['',np.nan,'','','o'],
                   'C':['a','s','d','f','g'],
                   'D':['f','g','h','j','k'],
                   'E':['l','i',np.nan,'u','o'],
                   'F':['','','o','i',np.nan]})

print (df)
     A    B  C  D    E    F
0  NaN       a  f    l     
1  7.0  NaN  s  g    i     
2  8.0       d  h  NaN    o
3  8.0       f  j    u    i
4  8.0    o  g  k    o  NaN

df2 = df[(df[['A', 'B', 'F']].values != '').all(axis=1)]
print (df2)
     A  B  C  D  E    F
4  8.0  o  g  k  o  NaN

df2 = df[(df[['A', 'B', 'F']].astype(str) != '').all(axis=1)]
print (df2)
     A  B  C  D  E    F
4  8.0  o  g  k  o  NaN

Remove rows if any of a set of values are null

Answers (2)

Related Questions