Pandas read_json converts string to decimal (though it has double quotes enclosing the data)

Question

I have a JSON file with a field which is supposed to be a string that represents an NPI Number. The JSON file looks like this:

[{ ...
"npi_109":"1234567891",
 ...
}, 
{ ...more records }]

I use pandas to read it in like this:

import pandas as pd
df = pd.read_json("temp/" + file.orig_filename, encoding = 'unicode_escape')

I read into a dataframe and then use pyarrow to write to Parquet. I see that field in parquet gets defined as a decimal. To get around the issue of the field being read as a decimal (despite the enclosing double quotes in the JSON), I am converting that one column to a string as follows:

 df['npi_109'] = df['npi_109'].astype(str)

But what ends up happening is the number gets converted to: "1234567891.0" which is not what we want, so is there a workaround for this issue?

Pandas read_json converts string to decimal (though it has double quotes enclosing the data)

Answers (1)

Related Questions