user4461099
user4461099

Reputation: 35

Using pandas.Series.dtype as dict keys?

I would like to create a dictionary using pandas.Series.dtype as keys. However, this does not work. See this example.

import pandas as pd
from pandas import DataFrame
import numpy as np

switcher = {
    np.int64: "It's an int!",
    np.float64: "It's a float!"}

df = DataFrame({'x1': [1,2,3,4], 'x2': [1.0, 2.0, 3.0, 4.0]})

x1_dtype = df['x1'].dtype

print(switcher.get(np.int64))
# prints: It's an int!

print(switcher.get(np.int64, "Key not found!"))
# prints: It's an int!

print(switcher.get(x1_dtype, "Key not found!"))
# prints: Key not found!

print(x1_dtype == np.int64)
# prints: True

I don't understand the result of the last two print statements. Apparently x1_dtype == np.int64 is true, yet when I say switcher.get(x1_dtype) it does not find any suitable key. After some time I realized that the root cause is that hashes for the two different keys are not same:

# Following line is False
hash(np.int64) == hash(x1_dtype)

So, I cannot compare the two naively using switcher.get. Next idea was to stringify them, but that does not work neither:

print(str(np.int64))
# prints: <class 'numpy.int64'>

print(str(x1_dtype))
# prints: int64

Yet, the pandas docs state that string comparisons should work (https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#basics-dtypes):

For methods requiring dtype arguments, strings can be specified as indicated.

Looks like I misunderstand this statement. Any idea how such a thing can be achieved?

Upvotes: 1

Views: 642

Answers (2)

Cameron Riddell
Cameron Riddell

Reputation: 13427

to my knowledge, your x1_dtype is an instance of numpy.dtype, whereasnp.int64 is an enum that the numpy.dtype class uses to construct a specific dtype.

Only x1_dtype is an instance of numpy.dtype:

>>> type(x1_dtype)
<class 'numpy.dtype'>

>>> type(np.int64)
<class 'type'>

Use np.int64 to construct an actual dtype:

>>> np.int64
<class 'numpy.int64'>

>>> x1_dtype
dtype('int64')

>>> np.dtype(np.int64)
dtype('int64')

So np.dtype uses an underlying type (e.g. np.int64) to construct a data type object. Thankfully you can access the underlying type via np.dtype(...).type and use that to compare:

>>> x1_dtype.type
<class 'numpy.int64'>

# which is the same class as np.int64
>>> np.int64
<class 'numpy.int64'>

So to get your code to work, we just need to use the underlying type of the dtype object.

>>> switcher.get(x1_dtype.type, "Key not found!")
"It's an int!"

Upvotes: 4

zyd
zyd

Reputation: 925

You're comparing a class (np.int64) with an instance of that class (x1_dtype). Try this:

str(x1_dtype.__class__) == np.int64

Upvotes: 3

Related Questions