Student
Student

Reputation: 1197

Does NumPy have a select_dtypes like Pandas?

Objective: to use NumPy in a similar way as Pandas with "select_dtypes".

Setting up a dataframe like the following:

>>> df = pd.DataFrame({'a': [1, 2] * 3,
...                    'b': [True, False] * 3,
...                    'c': [1.0, 2.0] * 3})
>>> df
        a      b  c
0       1   True  1.0
1       2  False  2.0
2       1   True  1.0
3       2  False  2.0
4       1   True  1.0
5       2  False  2.0

I am looking for something like this but with NumPy:

>>> df.select_dtypes(include=['float64'])
   c
0  1.0
1  2.0
2  1.0
3  2.0
4  1.0
5  2.0

Any help would be appreciated.

Upvotes: 2

Views: 840

Answers (1)

Szymon Maszke
Szymon Maszke

Reputation: 24691

Numpy arrays have elements which all have the same underlying type. Those are essentially C language arrays (and their data type has to be the same for all elements).

You can check it using .dtype attribute, like so:

import numpy as np

a = np.array([1.5, 2, 3])
print(a.dtype)

Would give you np.float64, even though two elements are inserted as ints

If you want to check whether a certain float could be an int (like 2 and 3 in the above example), you shouldn't do that, as floating point precision might be an issue.

If you really insist, you can use np.isclose to get a boolean array indicating whether each float element is close enough to it's floored int counterpart and those might be castable without too big loss in precision:

# For example above, e.g. [1.5, 2, 3]    
print(np.isclose(np.floor(a), a))

Would give you [False, True, True], meaning second and third element could be casted. Once again, I advise you not to do so.

EDIT: If you have boolean numpy array casted to np.float there is no way to get it back, as you cannot differentiate between bool casted to float and int casted to float with if int has either 0 or 1 value.

Upvotes: 3

Related Questions