Keithx
Keithx

Reputation: 3148

Parsing of nested structured json in Pandas

I'm receiving request from the API and try to process it in Python using requests library and json_normalize() function. Here are my steps:

import requests
from pandas.io.json import json_normalize 

url = "Some String"

headers = {

'Authorization':"Some Token"}

response = requests.request("GET", url, headers=headers)

data = response.json()

df = json_normalize(data)

It gives me output like that:

Col1                                                  Col2
[{'text': 'sometext', 'date':'1528322400000',...}]    [[1528322400000, 24], [1528322460000, 24]

I want to parse the nested structures inside the columns, make dataframes from them and merge on date.

The situation is that I can parse Col1 like that: df = json_normalize(data['Col1']) It will give me nice dataframe with columns of from this nested json and all is fine.

But it does not work with Col2 because basically it's the list. While performing df = json_normalize(data['Col2']) I receive an error: 'list' object has no attribute 'values'

My questions:

  1. Can I parse the nested lists just like I did with Col1 (using json_normalize() or smth other)?
  2. Will it be easier if I will make some changes to API itself and all the column names in Col2 just like it is in Col1 for easier parsing?

Thanks!

Upvotes: 0

Views: 168

Answers (2)

Vidya P V
Vidya P V

Reputation: 481

If the json data is a list of dictionaries with keys and values, like the one given below:

Data = [{'key1' : Value1, 
         'key2' : Value2},
        {'key1' : Value3, 
         'key2' : Value4},
        {'key1' : Value5, 
         'key2' : Value6}]

Then,

import pandas as pd
from pandas.io.json import json_normalize

DF = pd.DataFrame.from_dict(json_normalize(Data), orient='columns')

will give you the dataframe in the following format:

     key1     key2

0   Value1   Value2
1   Value3   Value4
2   Value5   Value6

Hope this helps.

Upvotes: 1

m33n
m33n

Reputation: 1751

Just create a dataframe from your col2 content it will work perfectly.

>>> import pandas as pd
>>>  pd.DataFrame([[1528322400000, 24], [1528322460000, 24]], columns=['c1','c2'])

              c1  c2
0  1528322400000  24
1  1528322460000  24

Is this what you want?

Upvotes: 1

Related Questions