DataPsycho
DataPsycho

Reputation: 988

Create a data frame from a complex nested dictionary?

I have a big nested, then nested then nested json file saved as .txt format. I need to access some specific key pairs and crate a data frame or another transformed json object for further use. Here is a small sample with 2 key pairs.

[
  {
"ko_id": [819752],
"concepts": [
  {
    "id": ["11A71731B880:http://ontology.intranet.com/Taxonomy/116@en"],
    "uri": ["http://ontology.intranet.com/Taxonomy/116"],
    "language": ["en"],
    "prefLabel": ["Client coverage & relationship management"]
  }
]
  },
  {
"ko_id": [819753],
"concepts": [
  {
    "id": ["11A71731B880:http://ontology.intranet.com/Taxonomy/116@en"],
    "uri": ["http://ontology.intranet.com/Taxonomy/116"],
    "language": ["en"],
    "prefLabel": ["Client coverage & relationship management"]
     }
   ]
 }
]

The following code load the data as list but I need to access to the data probably as a dictionary and I need the "ko_id", "uri" and "prefLabel" from each key pair and put it to a pandas data frame or a dictionary for further analysis.

with open('sample_data.txt') as data_file:    
   json_sample = js.load(data_file)

The following code gives me the exact value of the first element. But donot actually know how to put it together and build the ultimate algorithm to create the dataframe.

print(sample_dict["ko_id"][0])
print(sample_dict["concepts"][0]["prefLabel"][0])
print(sample_dict["concepts"][0]["uri"][0])

Upvotes: 0

Views: 512

Answers (2)

FJSevilla
FJSevilla

Reputation: 4543

You can pass the data to pandas.DataFrame using a generator:

import pandas as pd
import json as js

with open('sample_data.txt') as data_file:    
   json_sample = js.load(data_file)

df = pd.DataFrame(data = ((key["ko_id"][0],
                           key["concepts"][0]["prefLabel"][0],
                           key["concepts"][0]["uri"][0]) for key in json_sample),  
                  columns = ("ko_id", "prefLabel", "uri"))

Output:

>>> df

    ko_id                                  prefLabel                                        uri
0  819752  Client coverage & relationship management  http://ontology.intranet.com/Taxonomy/116   
1  819753  Client coverage & relationship management  http://ontology.intranet.com/Taxonomy/116 

Upvotes: 2

user8834780
user8834780

Reputation: 1670

for record in sample_dict:
    df = pd.DataFrame(record['concepts']) 
    df['ko_id'] = record['ko_id']
    final_df = final_df.append(df)

Upvotes: 2

Related Questions