dtype changes when using DataFrame.to_dict

Question

I have a uint64 column in my DataFrame, but when I convert that DataFrame to a list of python dict using DataFrame.to_dict('record'), what's previously a uint64 gets magically converted to float:

In [24]: mid['bd_id'].head()
Out[24]:
0                0
1    6957860914294
2    7219009614965
3    7602051814214
4    7916807114255
Name: bd_id, dtype: uint64

In [25]: mid.to_dict('record')[2]['bd_id']
Out[25]: 7219009614965.0

In [26]: bd = mid['bd_id']

In [27]: bd.head().to_dict()
Out[27]: {0: 0, 1: 6957860914294, 2: 7219009614965, 3: 7602051814214, 4: 7916807114255}

How can I avoid this strange behavior?

update

strangely enough, if I use to_dict() instead of to_dict('records'), the bd_id column will be of type int:

In [43]: mid.to_dict()['bd_id']
Out[43]:
{0: 0,
 1: 6957860914294,
 2: 7219009614965,
...

maxymoo · Accepted Answer

It's because another column has a float in it. More specifically to_dict('records') is implemented using the values attribute of the data frame rather than the columns itself, and this implements "implicit upcasting", in your case converting uint64 to float.

If you want to get around this bug, you could explicitly cast your dataframe to the object datatype:

df.astype(object).to_dict('record')[2]['bd_id']
Out[96]: 7602051814214

By the way, if you are using IPython and you want to see how a function is implemented in a library you can brink it up by putting ?? at the end of the method call. For pd.DataFrame.to_dict?? we see

    ...
    elif orient.lower().startswith('r'):
        return [dict((k, v) for k, v in zip(self.columns, row))
                for row in self.values]

dtype changes when using DataFrame.to_dict

update

Answers (2)

Related Questions