Reputation: 240
I am working on a dataset and I applied to_json() method to export a pandas dataframe in a json file. I will then use this json file to upload to MongoDB. However I realise that the datetime format was converted to epoch timestamp. How do I retain the datetime format while exporting to JSON file as well as uploading the json file to MongoDB? Also, I do not want null fields in json output.
df:
user_id datetime
0 69490717 [{'checkin_date': 2021-02-01 00:00:00}]
1 67125777 [{'checkin_date': 2021-02-01 00:00:00}]
2 62747294 NaN
3 63216896 [{'checkin_date': 2021-02-01 00:00:00}]
4 51466797 [{'checkin_date': 2021-01-31 00:00:00}]
... ... ...
96 82758550 NaN
97 44662827 NaN
98 36376189 [{'checkin_date': 2021-01-18 00:00:00}]
99 71910948 [{'checkin_date': 2021-01-18 00:00:00}, {'checkout_date': 2021-01-20 00:00:00}]
100 54620533 NaN
Snippet of json output:
[{"user_id":62507249,"datetime":[{"checkin_date":1612051200000},{"checkout_date":1612051200000}]},
{"user_id":69546481,"datetime":[{"checkin_date":1612051200000}]}, ......]
Below is my code for converting to json:
jsonresult = df.T.apply(lambda row: row[~row.isnull()].to_json())
json_wrapped = "[%s]" % ",".join(jsonresult)
# write JSON to file
with open('jsonresult.json', 'w') as f:
f.write(json_wrapped)
Upvotes: 2
Views: 3172
Reputation: 862511
I think you need date_format='iso'
parameter in DataFrame.to_json
and for remove rows with missing values is possible use DataFrame.dropna
:
a = [{"user_id":1,"datetime":[{"checkin_date":pd.Timestamp('2021-02-01 00:00:00')},
{"checkout_date":pd.Timestamp('2021-02-01 00:00:00')}]},
{"user_id":2,"datetime":[{"checkin_date":pd.Timestamp('2021-02-01 00:00:00')}]},
{"user_id":4,"datetime": np.nan}]
df = pd.DataFrame(a)
print (df)
user_id datetime
0 1 [{'checkin_date': 2021-02-01 00:00:00}, {'chec...
1 2 [{'checkin_date': 2021-02-01 00:00:00}]
2 4 NaN
jsonresult = df.dropna(subset=['datetime']).to_json(date_format='iso', orient='records')
print (jsonresult)
[{"user_id":1,"datetime":[{"checkin_date":"2021-02-01T00:00:00.000Z"},
{"checkout_date":"2021-02-01T00:00:00.000Z"}]},
{"user_id":2,"datetime":[{"checkin_date":"2021-02-01T00:00:00.000Z"}]}]
dictresult = df.dropna(subset=['datetime']).to_dict(orient='records')
print (dictresult)
[{'user_id': 1, 'datetime': [{'checkin_date': Timestamp('2021-02-01 00:00:00')},
{'checkout_date': Timestamp('2021-02-01 00:00:00')}]},
{'user_id': 2, 'datetime': [{'checkin_date': Timestamp('2021-02-01 00:00:00')}]}]
Upvotes: 2
Reputation: 34046
You need to convert your datetime
column to string
using Series.astype
:
df['datetime'] = df['datetime'].astype(str)
Also, your code for converting this to json
is unnecessary. You can do below:
jsonresult = df.to_json(orient='records')
Upvotes: 1