Mathemilda
Mathemilda

Reputation: 129

panda: dropping multiple columns and keeping only ones with numeric data

I need to clean my data frame and remove all columns without numeric data. I have columns with are classified as "object" and some marked as int/float, but containing mostly NaNs. I would like to keep only columns filled with numbers. Is there way to do it?

Upvotes: 3

Views: 63

Answers (1)

EdChum
EdChum

Reputation: 394031

Use select_dtypes and pass np.number to filter numeric types only:

In [69]:
df = pd.DataFrame({'int':np.arange(5), 'float':np.random.randn(5), 'str':list('abcde')})
df

Out[69]:
      float  int str
0  0.987218    0   a
1  0.336119    1   b
2  1.800194    2   c
3  4.566850    3   d
4 -0.306808    4   e

In [71]:    
df.select_dtypes([np.number])

Out[71]:
      float  int
0  0.987218    0
1  0.336119    1
2  1.800194    2
3  4.566850    3
4 -0.306808    4

This accepts any type in the numpy type hierarchy

To remove columns that contain any NaNs then you can call dropna(axis=1) thanks @Leb

Upvotes: 3

Related Questions