Brian Edwards
Brian Edwards

Reputation: 423

Pandas read_json converts string to decimal (though it has double quotes enclosing the data)

I have a JSON file with a field which is supposed to be a string that represents an NPI Number. The JSON file looks like this:

[{ ...
"npi_109":"1234567891",
 ...
}, 
{ ...more records }]

I use pandas to read it in like this:

import pandas as pd
df = pd.read_json("temp/" + file.orig_filename, encoding = 'unicode_escape')

I read into a dataframe and then use pyarrow to write to Parquet. I see that field in parquet gets defined as a decimal. To get around the issue of the field being read as a decimal (despite the enclosing double quotes in the JSON), I am converting that one column to a string as follows:

 df['npi_109'] = df['npi_109'].astype(str)

But what ends up happening is the number gets converted to: "1234567891.0" which is not what we want, so is there a workaround for this issue?

Upvotes: 1

Views: 1065

Answers (1)

SiP
SiP

Reputation: 1160

How about:

df['npi_109'] = df['npi_109'].astype(int).astype(str)

Or, if you don't need pandas to infer types when reading the json:

df = pd.read_json(filename, encoding = 'unicode_escape', dtype=False)

Or, force it to be a string column

df = pd.read_json(filename, encoding = 'unicode_escape', dtype={column_name: str})

Upvotes: 4

Related Questions