timfeirg
timfeirg

Reputation: 1512

dtype changes when using DataFrame.to_dict

I have a uint64 column in my DataFrame, but when I convert that DataFrame to a list of python dict using DataFrame.to_dict('record'), what's previously a uint64 gets magically converted to float:

In [24]: mid['bd_id'].head()
Out[24]:
0                0
1    6957860914294
2    7219009614965
3    7602051814214
4    7916807114255
Name: bd_id, dtype: uint64

In [25]: mid.to_dict('record')[2]['bd_id']
Out[25]: 7219009614965.0

In [26]: bd = mid['bd_id']

In [27]: bd.head().to_dict()
Out[27]: {0: 0, 1: 6957860914294, 2: 7219009614965, 3: 7602051814214, 4: 7916807114255}

How can I avoid this strange behavior?

update

strangely enough, if I use to_dict() instead of to_dict('records'), the bd_id column will be of type int:

In [43]: mid.to_dict()['bd_id']
Out[43]:
{0: 0,
 1: 6957860914294,
 2: 7219009614965,
...

Upvotes: 9

Views: 8153

Answers (2)

Saurabh
Saurabh

Reputation: 7833

You can use this

from pandas.io.json import dumps
import json
output=json.loads(dumps(mid,double_precision=0))

Upvotes: 1

maxymoo
maxymoo

Reputation: 36555

It's because another column has a float in it. More specifically to_dict('records') is implemented using the values attribute of the data frame rather than the columns itself, and this implements "implicit upcasting", in your case converting uint64 to float.

If you want to get around this bug, you could explicitly cast your dataframe to the object datatype:

df.astype(object).to_dict('record')[2]['bd_id']
Out[96]: 7602051814214

By the way, if you are using IPython and you want to see how a function is implemented in a library you can brink it up by putting ?? at the end of the method call. For pd.DataFrame.to_dict?? we see

    ...
    elif orient.lower().startswith('r'):
        return [dict((k, v) for k, v in zip(self.columns, row))
                for row in self.values]

Upvotes: 18

Related Questions