user13494862
user13494862

Reputation:

How to access a nested dictionary which is a row in a Dataframe

I have a json file which looks like this

{
    "file": "name",
    "main": [{
                "question_no": "Q.1",
                "question": "what is ?",
                "answer": [{
                        "user": "John",
                        "comment": "It is defined as",
                        "value": {
                            "numbers": 2,
                            "submitted_value": [{
                                    "time": "4:06",
                                    "my_value": {
                                        "value1": 5,
                                        "value2": 10
                                    }
                                },
                                {
                                    "time": "13:47",
                                    "my_value": {
                                        "value1": 24,
                                        "value2": 30
                                    }
                                }
                            ]
                        }

                    },
                    {
                        "user": "Sam",
                        "comment": "as John said above it simply means",
                        "value": {
                            "numbers": 2,
                            "submitted_value": [{
                                    "time": "6:08",
                                    "my_value": {
                                        "value1": 9,
                                        "value2": 10
                                    }
                                },
                                {
                                    "time": "15:24",
                                    "my_value": {
                                        "value1": 54,
                                        "value2": 19
                                    }
                                }
                            ]
                        },
                        "closed": "no"
                    }
                ]

            }]
            }

when I do data = pd.json_normalize(file["main"], record_path='answer', meta='question_no') the result I get is

   user                             comment    question_no           value                     
0  John                    It is defined as       Q.1       [{'my_value': 5, 'value_2': 10}, {'my_value': 24, 'value_2': 30}]
1   Sam  as John said above it simply means       Q.1        [{'my_value': 9, 'value_2': 10}, {'my_value': 54, 'value_2': 19}]

I need to access the values inside the list value and dictionary submitted_value to take the sum of my_value and value_2 as a new column. The actual file is a bit big so please consider the time taken to process the sum too.

desired result:

value1_sum      value2_sum     question_no              user      comment  
 29                40                Q.1                    john    It is defined as
 63                29                Q.1                     Sam     as John said above it simply means

the position of columns is not an issue.

Upvotes: 2

Views: 91

Answers (1)

NYC Coder
NYC Coder

Reputation: 7594

You can also do this:

with open('1.json', 'r+') as f:
    data = json.load(f)

df = pd.json_normalize(data['main'],
                       record_path=['answer', 'value', 'submitted_value'],
                       meta=[['question_no'], ['answer', 'user'], ['answer', 'comment']])
df = df.groupby(by=['answer.user', 'question_no', 'answer.comment'], as_index=False).sum()
print(df)

  answer.user question_no                      answer.comment  my_value.value1  my_value.value2
0        John         Q.1                    It is defined as               29               40
1         Sam         Q.1  as John said above it simply means               63               29

Upvotes: 1

Related Questions