Reputation: 2698
I'm trying to load a JSON file message_1.json
which has data structured as shown below:
{
"participants": [
{
"name": "X"
},
{
"name": "Y"
}
],
"messages": [
{
"sender_name": "X",
"timestamp_ms": 1608148658236,
"content": "Hi",
"type": "Generic"
},
{
"sender_name": "Y",
"timestamp_ms": 1608148604578,
"content": "Hey",
"type": "Generic"
},
{
"sender_name": "X",
"timestamp_ms": 1608148599875,
"content": "Bye",
"type": "Generic"
},
...
...
...
I have used df = pd.read_json("message_1.json")
which results in ValueError: arrays must all be same length
.
I believe a similar question has also been asked here and I've tried using the solution specified
with open('message_1.json') as json_data:
data = json.load(json_data) # Data is successfully loaded here in the above format
pd.DataFrame.from_dict(data, orient='timestamp_ms').T.set_index('timestamp_ms')
As I had no index
column in the data, I set the index to timestamp_ms
resulting in an error ValueError: only recognize index or columns for orient
I presume I've loaded the JSON in the wrong orientation.
Please Advise.
Upvotes: 1
Views: 288
Reputation: 1604
From the json-input, it seems you only want the keep the "messages" part.
import json
import pandas as pd
with open('message_1.json') as json_data:
data = json.load(json_data)
messages = data['messages']
# messages is a list, not a dict
df = pd.DataFrame(messages)
df.set_index('timestamp_ms', inplace=True)
output:
sender_name content type
timestamp_ms
1608148658236 X Hi Generic
1608148604578 Y Hey Generic
1608148599875 X Bye Generic
Upvotes: 1