snow
snow

Reputation: 240

How to export pandas dataframe into json and retain original datetime format?

I am working on a dataset and I applied to_json() method to export a pandas dataframe in a json file. I will then use this json file to upload to MongoDB. However I realise that the datetime format was converted to epoch timestamp. How do I retain the datetime format while exporting to JSON file as well as uploading the json file to MongoDB? Also, I do not want null fields in json output.

df:

    user_id     datetime
0   69490717    [{'checkin_date': 2021-02-01 00:00:00}]
1   67125777    [{'checkin_date': 2021-02-01 00:00:00}]
2   62747294    NaN
3   63216896    [{'checkin_date': 2021-02-01 00:00:00}]
4   51466797    [{'checkin_date': 2021-01-31 00:00:00}]
... ... ...
96  82758550    NaN
97  44662827    NaN
98  36376189    [{'checkin_date': 2021-01-18 00:00:00}]
99  71910948    [{'checkin_date': 2021-01-18 00:00:00}, {'checkout_date': 2021-01-20 00:00:00}]
100 54620533    NaN

Snippet of json output:

[{"user_id":62507249,"datetime":[{"checkin_date":1612051200000},{"checkout_date":1612051200000}]}, 
{"user_id":69546481,"datetime":[{"checkin_date":1612051200000}]}, ......]

Below is my code for converting to json:

jsonresult = df.T.apply(lambda row: row[~row.isnull()].to_json())
json_wrapped = "[%s]" % ",".join(jsonresult)

# write JSON to file
with open('jsonresult.json', 'w') as f:
    f.write(json_wrapped)

Upvotes: 2

Views: 3172

Answers (2)

jezrael
jezrael

Reputation: 862511

I think you need date_format='iso' parameter in DataFrame.to_json and for remove rows with missing values is possible use DataFrame.dropna:

a = [{"user_id":1,"datetime":[{"checkin_date":pd.Timestamp('2021-02-01 00:00:00')},
                                     {"checkout_date":pd.Timestamp('2021-02-01 00:00:00')}]},
     {"user_id":2,"datetime":[{"checkin_date":pd.Timestamp('2021-02-01 00:00:00')}]},
     {"user_id":4,"datetime": np.nan}]
df = pd.DataFrame(a)
print (df)
   user_id                                           datetime
0        1  [{'checkin_date': 2021-02-01 00:00:00}, {'chec...
1        2            [{'checkin_date': 2021-02-01 00:00:00}]
2        4                                                NaN

jsonresult = df.dropna(subset=['datetime']).to_json(date_format='iso', orient='records')
print (jsonresult)
[{"user_id":1,"datetime":[{"checkin_date":"2021-02-01T00:00:00.000Z"},
                          {"checkout_date":"2021-02-01T00:00:00.000Z"}]},
 {"user_id":2,"datetime":[{"checkin_date":"2021-02-01T00:00:00.000Z"}]}]

dictresult = df.dropna(subset=['datetime']).to_dict(orient='records')
print (dictresult)
[{'user_id': 1, 'datetime': [{'checkin_date': Timestamp('2021-02-01 00:00:00')},
                             {'checkout_date': Timestamp('2021-02-01 00:00:00')}]}, 
 {'user_id': 2, 'datetime': [{'checkin_date': Timestamp('2021-02-01 00:00:00')}]}]

Upvotes: 2

Mayank Porwal
Mayank Porwal

Reputation: 34046

You need to convert your datetime column to string using Series.astype:

df['datetime'] = df['datetime'].astype(str)

Also, your code for converting this to json is unnecessary. You can do below:

jsonresult  = df.to_json(orient='records')

Upvotes: 1

Related Questions