user2242044
user2242044

Reputation: 9213

Prevent Pandas to_json() from adding time component to date object

I have a dataframe of that contains some date objects. I need to convert to a json for use in JavaScript, which requires YYYY-MM-DD, but to_json() keeps adding a time component. I've seen a number of answers that convert to a string first, but this is part of a loop of about 15 queries each with many columns (simplified it for the SO question) and I don't want to hardcode each column conversion as there are a lot.

import pandas as pd
from datetime import date
df = pd.DataFrame(data=[[date(year=2018, month=1, day=1)]])    
print df.to_json(orient='records', date_format='iso', date_unit='s')

Output:

[{"0":"2018-01-01T00:00:00Z"}]

Desired Output:

[{"0":"2018-01-01"}]

Upvotes: 5

Views: 6255

Answers (5)

Mattijn
Mattijn

Reputation: 13880

Generic solution would be as follows:

df.assign( **df.select_dtypes(['datetime']).astype(str).to_dict('list') ).to_json(orient="records")

Based on the dtype it selects the datetime columns and set these as str objects so the date format is kept during serialization.

Upvotes: 0

hcslomeu
hcslomeu

Reputation: 1

I had that problem as well, but since I was looking only for the date, discarding the timezone, I was able to go around that using the following expression:

df = pd.read_json('test.json')
df['date_hour'] = [datetime.strptime(date[0:10],'%Y-%m-%d').date() for date in df['date_hour']]

So, if you have 'iso' date_format for df[date_hour] in the json file = "2018-01-01T00:00:00Z" you may use this solution.

This way you can extract the bit that really matters. Important to say that you must do it using this list comprehension, because the conversion can only be done string by string (or row by row), otherwise, the datetime.strptime alone, would throw an error saying that cannot be used with series.

Upvotes: 0

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

Demo:

Source DF:

In [249]: df = pd.DataFrame({
     ...:   'val':np.random.rand(5),
     ...:   'date1':pd.date_range('2018-01-01',periods=5),
     ...:   'date2':pd.date_range('2017-12-15',periods=5)
     ...: })

In [250]: df
Out[250]:
       date1      date2       val
0 2018-01-01 2017-12-15  0.539349
1 2018-01-02 2017-12-16  0.308532
2 2018-01-03 2017-12-17  0.788588
3 2018-01-04 2017-12-18  0.526541
4 2018-01-05 2017-12-19  0.887299

In [251]: df.dtypes
Out[251]:
date1    datetime64[ns]
date2    datetime64[ns]
val             float64
dtype: object

You can cast datetime columns to strings in one command:

In [252]: df.update(df.loc[:, df.dtypes.astype(str).str.contains('date')].astype(str))

In [253]: df.dtypes
Out[253]:
date1     object
date2     object
val      float64
dtype: object

In [254]: df.to_json(orient='records')
Out[254]: '[{"date1":"2018-01-01","date2":"2017-12-15","val":0.5393488718},{"date1":"2018-01-02","date2":"2017-12-16","val":0.3085324043},{"
date1":"2018-01-03","date2":"2017-12-17","val":0.7885879674},{"date1":"2018-01-04","date2":"2017-12-18","val":0.5265407505},{"date1":"2018-0
1-05","date2":"2017-12-19","val":0.887298853}]'

Alternatively you can cast date columns to strings on the SQL side

Upvotes: 1

Cyzanfar
Cyzanfar

Reputation: 7136

You could use strftime('%Y-%m-%d') format like so:

df = pd.DataFrame(data=[[date(year=2018, month=1, day=1).strftime('%Y-%m-
    %d')]]

print(df.to_json(orient='records', date_format='iso', date_unit='s'))

# [{"0":"2018-01-01"}]

I think this is the best approach for now until pandas adds a way to write only the date out of datetime.

Upvotes: 3

wim
wim

Reputation: 362557

Pandas does not currently have the feature. There is an open issue about this, you should subscribe to the issue in case more options for the date_format argument are added in a future release (which seems like a reasonable feature request):

No way with to_json to write only date out of datetime #16492

Manually converting the relevant columns to string before dumping out json is likely the best option.

Upvotes: 7

Related Questions