Reputation:
I have a json file which looks like this
{
"file": "name",
"main": [{
"question_no": "Q.1",
"question": "what is ?",
"answer": [{
"user": "John",
"comment": "It is defined as",
"value": {
"numbers": 2,
"submitted_value": [{
"time": "4:06",
"my_value": {
"value1": 5,
"value2": 10
}
},
{
"time": "13:47",
"my_value": {
"value1": 24,
"value2": 30
}
}
]
}
},
{
"user": "Sam",
"comment": "as John said above it simply means",
"value": {
"numbers": 2,
"submitted_value": [{
"time": "6:08",
"my_value": {
"value1": 9,
"value2": 10
}
},
{
"time": "15:24",
"my_value": {
"value1": 54,
"value2": 19
}
}
]
},
"closed": "no"
}
]
}]
}
when I do data = pd.json_normalize(file["main"], record_path='answer', meta='question_no')
the result I get is
user comment question_no value
0 John It is defined as Q.1 [{'my_value': 5, 'value_2': 10}, {'my_value': 24, 'value_2': 30}]
1 Sam as John said above it simply means Q.1 [{'my_value': 9, 'value_2': 10}, {'my_value': 54, 'value_2': 19}]
I need to access the values inside the list value
and dictionary submitted_value
to take the sum of my_value and value_2
as a new column. The actual file is a bit big so please consider the time taken to process the sum too.
desired result:
value1_sum value2_sum question_no user comment
29 40 Q.1 john It is defined as
63 29 Q.1 Sam as John said above it simply means
the position of columns is not an issue.
Upvotes: 2
Views: 91
Reputation: 7594
You can also do this:
with open('1.json', 'r+') as f:
data = json.load(f)
df = pd.json_normalize(data['main'],
record_path=['answer', 'value', 'submitted_value'],
meta=[['question_no'], ['answer', 'user'], ['answer', 'comment']])
df = df.groupby(by=['answer.user', 'question_no', 'answer.comment'], as_index=False).sum()
print(df)
answer.user question_no answer.comment my_value.value1 my_value.value2
0 John Q.1 It is defined as 29 40
1 Sam Q.1 as John said above it simply means 63 29
Upvotes: 1