Reputation: 1
I have a .json file which I am reading to pandas. Then I am exporting the dataframe to .csv. However, the input '40007207255820210316' from the .json is being written as '40007207255820200000' in the csv as it is getting converted to a scientific notation. How can I get the exact value written to the csv as from the original .json.
data=pd.read_json("file.json")
data.rename(columns = {'Key_number':'Key'}, inplace = True)
data.to_csv("file.csv", index=False)
No other manipulations are taking place on this column.
Upvotes: 0
Views: 1448
Reputation: 2249
If you check numpy Data Types you'll find that maximum int64
allowed is
np.iinfo(np.int64).max
9223372036854775807
and maximum float64
is
np.finfo(np.float64).max
1.7976931348623157e+308
your number 40007207255820210316
is greater than maximum int64
40007207255820210316 > np.iinfo(np.int64).max
True
but less than maximum float64
40007207255820210316 > np.finfo(np.float64).max
False
so when you load the json, numbers will be loaded as dtype float64
df = pd.read_json('file.json')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 value 2 non-null float64
dtypes: float64(1)
memory usage: 144.0 bytes
and if you try to load with dtype int64
you'd get an error
df = pd.read_json('file.json', dtype=np.int64)
OverflowError: Python int too large to convert to C long
Thus, if you only need to load json and then write the DataFrame as csv you can do
df = pd.read_json('file_scientific.json', dtype=str)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 value 2 non-null object
dtypes: object(1)
memory usage: 144.0+ bytes
in this way, it's not read as number but as string, and it'll be written as you want in the csv
df.to_csv("file.csv", index=False)
Upvotes: 2
Reputation: 2242
If you truly don't need to do any manipulation, you can read everything in as strings with:
data = pd.read_json("file.json", dtype=str)
This will avoid any loss of precision you are getting in numeric values where pandas is trying to automatically infer dtypes.
Technically, dtype=False
is the correct way to do this according to the docs, but the intention of dtype=str
(or equivalently, dtype=object
) is clearer and works as well.
Upvotes: 0