Farzin
Farzin

Reputation: 419

3 digits are truncated from an 18 digit long column when I save my pandas dataframe as a csv file

I've created a dataframe from a dictionary populated by a JSON file. The "id" column in my dataframe contains 18 digits integer. When I print ids using python the value is equal to the original value in the JSON file, but when I try to save my dataframe as a csv file the id column is saved in scientific notation and also three digits on the right side of ids are converted to zero. How can I prevent the id value conversion to scientific notation in the csv file?

For example, I get 1.01045E+17 for id when I view the csv, and when I see the value I get 101044874404577000, while the true value is 101044874404577281

This is my code:

with open(data_path, encoding="utf8") as input_file:
    data = json.load(input_file)
df = pd.DataFrame.from_dict(data)
df.to_csv(output_path)

I have tried to change the format of the column by astype to str, object, and float64 but that didn't help. I've set float_format argument of to_csv function to save floating numbers with specified number of decimals, too. This didn't work either.

Upvotes: 1

Views: 2206

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 148880

The problem is not in convertion to csv, but when you load it into a spreadsheet. Excel is known to have a poor support for csv files, and unfortunately LibreOffice calc does its best to emulate Excel...

It sees a number and wrongly convert it as a double value.

Demo:

In python:

>>> x = 101044874404577281
>>> data = [{"id": x, "str": "foo"}]
>>> fd = io.StringIO()
>>> json.dump(data, fd)
>>> fd.seek(0,0)
0
>>> df = pd.read_json(fd)
>>> print(df)
                   id  str
0  101044874404577281  foo

Ok everything is fine here. Let's go on:

>>> df.to_csv('foo.csv', index=False)

When opened in a dumb text editor, foo.csv if still fine:

id,str
101044874404577281,foo

Let's load that into a spreadsheet:

enter image description here

Shame on calc, it put that into scientific notation! Let's use normal display:

enter image description here

Sigh... it has destroyed the last digits...

TL/DR: It is not a problem in Python nor in pandas nor even with the csv file but only with Excel.

Upvotes: 3

Related Questions