tafmk
tafmk

Reputation: 1

Remove Scientific notation from csv output

I have a .json file which I am reading to pandas. Then I am exporting the dataframe to .csv. However, the input '40007207255820210316' from the .json is being written as '40007207255820200000' in the csv as it is getting converted to a scientific notation. How can I get the exact value written to the csv as from the original .json.

data=pd.read_json("file.json")
data.rename(columns = {'Key_number':'Key'}, inplace = True)
data.to_csv("file.csv", index=False)

No other manipulations are taking place on this column.

Upvotes: 0

Views: 1448

Answers (2)

Max Pierini
Max Pierini

Reputation: 2249

If you check numpy Data Types you'll find that maximum int64 allowed is

np.iinfo(np.int64).max
9223372036854775807

and maximum float64 is

np.finfo(np.float64).max
1.7976931348623157e+308

your number 40007207255820210316 is greater than maximum int64

40007207255820210316 > np.iinfo(np.int64).max
True

but less than maximum float64

40007207255820210316 > np.finfo(np.float64).max
False

so when you load the json, numbers will be loaded as dtype float64

df = pd.read_json('file.json')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   value   2 non-null      float64
dtypes: float64(1)
memory usage: 144.0 bytes

and if you try to load with dtype int64 you'd get an error

df = pd.read_json('file.json', dtype=np.int64)
OverflowError: Python int too large to convert to C long

Thus, if you only need to load json and then write the DataFrame as csv you can do

df = pd.read_json('file_scientific.json', dtype=str)
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   value   2 non-null      object
dtypes: object(1)
memory usage: 144.0+ bytes

in this way, it's not read as number but as string, and it'll be written as you want in the csv

df.to_csv("file.csv", index=False)

Upvotes: 2

Frodnar
Frodnar

Reputation: 2242

If you truly don't need to do any manipulation, you can read everything in as strings with:

data = pd.read_json("file.json", dtype=str)

This will avoid any loss of precision you are getting in numeric values where pandas is trying to automatically infer dtypes.

Technically, dtype=False is the correct way to do this according to the docs, but the intention of dtype=str (or equivalently, dtype=object) is clearer and works as well.

Upvotes: 0

Related Questions