Accessing a dictionary using Pandas

Question

I need to create a pandas dataframe from a list of nested dictionaries .Below is my dictionary:

[
  {
    "id": "e182_1234",
    "stderr": {
      "type": "stderr",
      "upload time": "Thu Jun 25 12:24:52 +0100 2020",
      "length": 3000,
      "contents": [
        {
          "date": "20/06/25",
          "time": "12:19:39",
          "type": "ERROR",
          "text": "Exception found
java.io.Exception:Not initated
    at.apache.java.org........",
          "line_start": 12,
          "line_end": 15
        },
        {
          "date": "20/06/25",
          "time": "12:20:41",
          "type": "WARN",
          "text": "Warning as the node is accessed without started",
          "line_start": 17,
          "line_end": 17
        }
      ]
    }
  }
 ]

I tried to create a dataframe using the below code:

df=pd.DataFrame(filtered_data) #filtered_data is the above dictionary
res1=df.join(pd.DataFrame(df.pop("stderr").tolist()))
res2=res1.join(pd.DataFrame(res1.pop("contents").tolist()))

Result I got :

#df=pd.DataFrame(filtered_data)
          id                                             stderr
0  e182_1234  {'type': 'stderr', 'upload time': 'Thu Jun 25 ...

#res1=df.join(pd.DataFrame(df.pop("stderr").tolist()))
         id    type                     upload time  length                                           contents
0  e182_1234  stderr  Thu Jun 25 12:24:52 +0100 2020    3000  [{'date': '20/06/25', 'time': '12:19:39', 'typ...

#res2=res1.join(pd.DataFrame(res1.pop("contents").tolist()))
          id    type                     upload time  length                                                  0                                                  1
0  e182_1234  stderr  Thu Jun 25 12:24:52 +0100 2020    3000  {'date': '20/06/25', 'time': '12:19:39', 'type...  {'date': '20/06/25', 'time': '12:20:41', 'type...

As you can when I split those list of directories it comes with the column name 0 and 1. I want those columns to be separated like date,time,type,text,line_start,line_end as separate columns.

Expected Output:

         id    type                     upload time  length   date        time       type      text                                                                              line_start      line_end
0  e182_1234  stderr  Thu Jun 25 12:24:52 +0100 2020    3000   20/06/25   12:19:39    ERROR   Exception found
java.io.Exception:Not initated
    at.apache.java.org........       12               15
1  e182_1234  stderr  Thu Jun 25 12:24:52 +0100 2020    3000   20/06/25   12:20:41    WARN    WARN Warning as the node is accessed without started                                  17               17

How to sort this issue ? Thanks in advance!

NYC Coder · Accepted Answer

You can use json_normalize for this:

with open('1.json', 'r+') as f:
    data = json.load(f)

df = pd.json_normalize(data, record_path=['stderr', 'contents'], meta=[['id'], ['stderr', 'type']])
print(df)


       date      time   type                                               text  line_start  line_end         id stderr.type
0  20/06/25  12:19:39  ERROR  Exception found
java.io.Exception:Not initate...          12        15  e182_1234      stderr
1  20/06/25  12:20:41   WARN    Warning as the node is accessed without started          17        17  e182_1234      stderr

Accessing a dictionary using Pandas

Answers (1)

Related Questions