Why is a string integer read incorrectly with pandas.read_json?

Question

I am not the one for any hyperbole but I am really stumped by this error and i am sure you will be too..

Here is a simple json object:

[
    {
        "id": "7012104767417052471",
        "session": -1332751885,
        "transactionId": "515934477",
        "ts": "2019-10-30 12:15:40 AM (+0000)",
        "timestamp": 1572394540564,
        "sku": "1234",
        "price": 39.99,
        "qty": 1,
        "ex": [
            {
                "expId": 1007519,
                "versionId": 100042440,
                "variationId": 100076318,
                "value": 1
            }
        ]
    }
]

Now I saved the file into ex.json and then executed the following python code:

import pandas as pd

df = pd.read_json('ex.json')

When i see the dataframe the value of my id has changed from "7012104767417052471" to "7012104767417052160"py

Does anyone understand why python does this? I tried it in node, js, and even excel and it is looking fine in everything else..

If I do this I get the right id:

with open('Siva.json') as data_file:    
    data = json.load(data_file)
df = json_normalize(data)

But I want to understand why pandas doesn't process json properly in a strange way.

Trenton McKinney · Accepted Answer

This is a known issue:

import pandas as pd

df = pd.read_json('test.json', dtype={'id': 'int64'})

                  id     session  transactionId                              ts               timestamp   sku  price  qty                                                                                  ex
 7012104767417052471 -1332751885      515934477  2019-10-30 12:15:40 AM (+0000) 2019-10-30 00:15:40.564  1234  39.99    1  [{'expId': 1007519, 'versionId': 100042440, 'variationId': 100076318, 'value': 1}]

Why is a string integer read incorrectly with pandas.read_json?

Answers (1)

This is a known issue:

Related Questions