Reputation: 35
I would like to create a dictionary using pandas.Series.dtype as keys. However, this does not work. See this example.
import pandas as pd
from pandas import DataFrame
import numpy as np
switcher = {
np.int64: "It's an int!",
np.float64: "It's a float!"}
df = DataFrame({'x1': [1,2,3,4], 'x2': [1.0, 2.0, 3.0, 4.0]})
x1_dtype = df['x1'].dtype
print(switcher.get(np.int64))
# prints: It's an int!
print(switcher.get(np.int64, "Key not found!"))
# prints: It's an int!
print(switcher.get(x1_dtype, "Key not found!"))
# prints: Key not found!
print(x1_dtype == np.int64)
# prints: True
I don't understand the result of the last two print statements. Apparently x1_dtype == np.int64 is true, yet when I say switcher.get(x1_dtype) it does not find any suitable key. After some time I realized that the root cause is that hashes for the two different keys are not same:
# Following line is False
hash(np.int64) == hash(x1_dtype)
So, I cannot compare the two naively using switcher.get. Next idea was to stringify them, but that does not work neither:
print(str(np.int64))
# prints: <class 'numpy.int64'>
print(str(x1_dtype))
# prints: int64
Yet, the pandas docs state that string comparisons should work (https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#basics-dtypes):
For methods requiring dtype arguments, strings can be specified as indicated.
Looks like I misunderstand this statement. Any idea how such a thing can be achieved?
Upvotes: 1
Views: 642
Reputation: 13427
to my knowledge, your x1_dtype
is an instance of numpy.dtype
, whereasnp.int64
is an enum that the numpy.dtype
class uses to construct a specific dtype.
Only x1_dtype
is an instance of numpy.dtype
:
>>> type(x1_dtype)
<class 'numpy.dtype'>
>>> type(np.int64)
<class 'type'>
Use np.int64
to construct an actual dtype:
>>> np.int64
<class 'numpy.int64'>
>>> x1_dtype
dtype('int64')
>>> np.dtype(np.int64)
dtype('int64')
So np.dtype
uses an underlying type (e.g. np.int64
) to construct a data type object. Thankfully you can access the underlying type via np.dtype(...).type
and use that to compare:
>>> x1_dtype.type
<class 'numpy.int64'>
# which is the same class as np.int64
>>> np.int64
<class 'numpy.int64'>
So to get your code to work, we just need to use the underlying type of the dtype object.
>>> switcher.get(x1_dtype.type, "Key not found!")
"It's an int!"
Upvotes: 4
Reputation: 925
You're comparing a class (np.int64) with an instance of that class (x1_dtype). Try this:
str(x1_dtype.__class__) == np.int64
Upvotes: 3