Fra
Fra

Reputation: 5188

Pandas to_json changing data type

I noticed this behavior, not sure it's a bug. I create a dataframe with 2 integer columns and 1 float column

import pandas as pd
df = pd.DataFrame([[1,2,0.2],[3,2,0.1]])
df.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 3 columns):
0    2 non-null int64
1    2 non-null int64
2    2 non-null float64
dtypes: float64(1), int64(2)

If I output that to Json, the dtype information is lost:

df.to_json(orient= 'records')

'[{"0":1.0,"1":2.0,"2":0.2},{"0":3.0,"1":2.0,"2":0.1}]'

All data is converted to float. This is a problem if for example one column contains ns timestamps, because they are converted to exponential notation and the sub-second information is lost.

I also filed the issue here: https://github.com/pydata/pandas/issues/7583

The result I was expecting is:

'[{"0":1,"1":2,"2":0.2},{"0":3,"1":2,"2":0.1}]'

Upvotes: 7

Views: 7383

Answers (2)

Elvin Rejimone
Elvin Rejimone

Reputation: 1

Yes, I noticed the same behaviour , if all values in a column looks like a type, it will just auto-detect and convert it. In m case, I had to convert it back to dataframe later in another module and it was a nightmare, so I had to store the datatype information in-order to maintain data integrity(I found that this is much better than using other json format like "records" as supposed to "split" or "index", Like this :

# Before converting to JSON , store the datatypes using 
data_types = data.dtypes
#I stored it in my redis server

#then after converting back :
def ensure_data_type_integrity(data, data_types):
 `enter code here`for column, dtype in data_types.iteritems():
     if dtype == 'object':
         try:
             data[column] = data[column].astype(dtype)
         except ValueError:
             print(f"Conversion error: Unable to convert column '{column}' to '{dtype}'")

Upvotes: 0

Andy Hayden
Andy Hayden

Reputation: 375475

One way is to view the DataFrame columns with object dtype:

In [11]: df1 = df.astype(object)

In [12]: df1.to_json()
Out[12]: '{"0":{"0":1,"1":3},"1":{"0":2,"1":2},"2":{"0":0.2,"1":0.1}}'

In [13]: df1.to_json(orient='records')
Out[13]: '[{"0":1,"1":2,"2":0.2},{"0":3,"1":2,"2":0.1}]'

Upvotes: 2

Related Questions