Reputation: 419
I've created a dataframe from a dictionary populated by a JSON file. The "id" column in my dataframe contains 18 digits integer. When I print ids using python the value is equal to the original value in the JSON file, but when I try to save my dataframe as a csv file the id column is saved in scientific notation and also three digits on the right side of ids are converted to zero. How can I prevent the id value conversion to scientific notation in the csv file?
For example, I get 1.01045E+17
for id when I view the csv, and when I see the value I get 101044874404577000
, while the true value is 101044874404577281
This is my code:
with open(data_path, encoding="utf8") as input_file:
data = json.load(input_file)
df = pd.DataFrame.from_dict(data)
df.to_csv(output_path)
I have tried to change the format of the column by astype
to str
, object
, and float64
but that didn't help.
I've set float_format
argument of to_csv
function to save floating numbers with specified number of decimals, too. This didn't work either.
Upvotes: 1
Views: 2206
Reputation: 148880
The problem is not in convertion to csv, but when you load it into a spreadsheet. Excel is known to have a poor support for csv files, and unfortunately LibreOffice calc does its best to emulate Excel...
It sees a number and wrongly convert it as a double value.
Demo:
In python:
>>> x = 101044874404577281
>>> data = [{"id": x, "str": "foo"}]
>>> fd = io.StringIO()
>>> json.dump(data, fd)
>>> fd.seek(0,0)
0
>>> df = pd.read_json(fd)
>>> print(df)
id str
0 101044874404577281 foo
Ok everything is fine here. Let's go on:
>>> df.to_csv('foo.csv', index=False)
When opened in a dumb text editor, foo.csv if still fine:
id,str
101044874404577281,foo
Let's load that into a spreadsheet:
Shame on calc, it put that into scientific notation! Let's use normal display:
Sigh... it has destroyed the last digits...
TL/DR: It is not a problem in Python nor in pandas nor even with the csv file but only with Excel.
Upvotes: 3