Zanshin
Zanshin

Reputation: 1272

Subset columns with NaN values in Pandas

Searched and tried several answers here on SO, but they are all for returning rows with NaN's. I'd like to return only the columns with NaN values. For example the following df. How can I select columns 'A' and 'LG'?

df = pd.DataFrame(
        {'H': ['a','b', 'c'],
         'A': [np.nan,'d', 'e'],
         'LG':['AR1', 'RO1', np.nan],
         })

print(df)

     A  H   LG
0  NaN  a  AR1
1    d  b  RO1
2    e  c  NaN

Upvotes: 3

Views: 7331

Answers (2)

jezrael
jezrael

Reputation: 862751

I think you need first replace strings NaN to np.nan in sample:

df = pd.DataFrame(
        {'H': ['a','b', 'c'],
         'A': [np.nan,'d', 'e'],
         'LG':['AR1', 'RO1', np.nan],
         })

Then check by isnull and any:

mask = df.isnull().any()
print (mask)
A      True
H     False
LG     True
dtype: bool

Last use boolean indexing with index:

print (mask.index[mask])
Index(['A', 'LG'], dtype='object')

If need columns add loc:

print (df.loc[:, mask])
     A   LG
0  NaN  AR1
1    d  RO1
2    e  NaN

Upvotes: 4

Michael Gecht
Michael Gecht

Reputation: 1444

This returns all columns containing NaN.

df = pd.DataFrame(
        {'H': ['a','b', 'c'],
         'A': [np.nan,'d', 'e'],
         'LG':['AR1', 'RO1', np.nan],
         })

x = pd.isnull(df).any()

print(df[x.index[x]])

     A   LG
0  NaN  AR1
1    d  RO1
2    e  Na

Upvotes: 1

Related Questions