Ari
Ari

Reputation: 75

How to convert nested dictionaries to a pandas DataFrame?

I want to transform the result of a call from an API to a dataframe. The result of the API call is a nested dictionary, but the produced dataframe is not as I need it.

In addition to json_normalize, I tried pd.DataFrame.from_dict. However, until now had been unsuccessful. I also tried to flatten the dictionary, but nothing.

I used the following call:

[73] results = requests.get(url).json()
results

And the output was:

{'result': {'totalrows': 3124,
  'rows': [{'rownum': 1,
    'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
     {'field': 'issueid', 'value': 472683},
     {'field': 'ticker', 'value': 'AAPL'},
     {'field': 'companyname', 'value': 'APPLE INC'},
     {'field': 'issuetitle', 'value': 'COM'},
     {'field': 'filerid', 'value': 1089387}]},
   {'rownum': 2,
    'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
     {'field': 'issueid', 'value': 472683},
     {'field': 'ticker', 'value': 'AAPL'},
     {'field': 'companyname', 'value': 'APPLE INC'},
     {'field': 'issuetitle', 'value': 'COM'},
     {'field': 'filerid', 'value': 1086893}]},
   {'rownum': 3,
    'values': [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'},
     {'field': 'issueid', 'value': 472683},
     {'field': 'ticker', 'value': 'AAPL'},
     {'field': 'companyname', 'value': 'APPLE INC'},
     {'field': 'issuetitle', 'value': 'COM'},
     {'field': 'filerid', 'value': 1085803}]}

Then to produce the data frame, I used the following code:


[74] Owners = results['result']['rows']
df1 = json_normalize(Owners)
df1.head()

This was the output:

  rownum    values
0   1      [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, 
           {'field': 'issueid', 'value': 472683}, {'field': 
           'ticker', 'value': 'AAPL'}, {'field': 'companyname', 
           'value': 'APPLE INC'}, {'field': 'issuetitle', 'value': 
           'COM'}, {'field': 'filerid', 'value': 1089387} 

1   2      [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, 
           {'field': 'issueid', 'value': 472683}, {'field': 
           'ticker', 'value': 'AAPL'}, {'field': 'companyname', 
           'value': 'APPLE INC'}, {'field': 'issuetitle', 'value': 
           'COM'}, {'field': 'filerid', 'value': 1086893}

2   3      [{'field': 'querydate', 'value': '7/31/2019 3:19 PM'}, {'field': 
           'issueid', 'value': 472683}, {'field': 'ticker', 'value': 'AAPL'}, 
           {'field': 'companyname', 'value': 'APPLE INC'}, {'field': 
           'issuetitle', 'value': 'COM'}, {'field': 'filerid', 'value': 1085803}

However, I want to obtain a DataFrame with the following format:

Desired DataFrame format

Upvotes: 4

Views: 1348

Answers (2)

Sash Sinha
Sash Sinha

Reputation: 22473

Naive nested for loop attempt...

import pandas as pd

df = pd.DataFrame([])

for row in json["result"]["rows"]:
    rownum = row["rownum"]
    querydate = issueid = ticker = companyname = issuetitle = filerid = None
    for value_dict in row["values"]:
        if value_dict["field"] == "querydate":
            querydate = value_dict["value"]
        elif value_dict["field"] == "issueid":
            issueid = value_dict["value"]
        elif value_dict["field"] == "ticker":
            ticker = value_dict["value"]
        elif value_dict["field"] == "companyname":
            companyname = value_dict["value"]
        elif value_dict["field"] == "filerid":
            filerid = value_dict["value"]
    df = df.append(pd.DataFrame({"rownum": rownum,
                                 "querydate": querydate,
                                 "issueid": issueid,
                                 "ticker": ticker,
                                 "companyname": companyname,
                                 "issuetitle": issuetitle,
                                 "filerid": filerid,
                                }, index=[0]), ignore_index=True)

print(df)

JSON object:

json = {
    "result": {
        "totalrows": 3,
        "rows": [
            {
                "rownum": 1,
                "values": [
                    {
                        "field": "querydate",
                        "value": "7/31/2019 3:19 PM"
                    },
                    {
                        "field": "issueid",
                        "value": 472683
                    },
                    {
                        "field": "ticker",
                        "value": "AAPL"
                    },
                    {
                        "field": "companyname",
                        "value": "APPLE INC"
                    },
                    {
                        "field": "issuetitle",
                        "value": "COM"
                    },
                    {
                        "field": "filerid",
                        "value": 1089387
                    }
                ]
            },
            {
                "rownum": 2,
                "values": [
                    {
                        "field": "querydate",
                        "value": "7/31/2019 3:19 PM"
                    },
                    {
                        "field": "issueid",
                        "value": 472683
                    },
                    {
                        "field": "ticker",
                        "value": "AAPL"
                    },
                    {
                        "field": "companyname",
                        "value": "APPLE INC"
                    },
                    {
                        "field": "issuetitle",
                        "value": "COM"
                    },
                    {
                        "field": "filerid",
                        "value": 1086893
                    }
                ]
            },
            {
                "rownum": 3,
                "values": [
                    {
                        "field": "querydate",
                        "value": "7/31/2019 3:19 PM"
                    },
                    {
                        "field": "issueid",
                        "value": 472683
                    },
                    {
                        "field": "ticker",
                        "value": "AAPL"
                    },
                    {
                        "field": "companyname",
                        "value": "APPLE INC"
                    },
                    {
                        "field": "issuetitle",
                        "value": "COM"
                    },
                    {
                        "field": "filerid",
                        "value": 1085803
                    }
                ]
            }
        ]
    }
}

Output:

   rownum          querydate  issueid ticker companyname issuetitle  filerid
0       1  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1089387
1       2  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1086893
2       3  7/31/2019 3:19 PM   472683   AAPL   APPLE INC        COM  1085803

Upvotes: 0

Alexandre B.
Alexandre B.

Reputation: 5500

You can use pandas.DataFrame.from_dict but you need to remove all unnecessary data in your data. Actually, you only want to keep the field value and value per row. You can do it with list comprehension:

data = [{ field["field"]:field["value"] for field in row['values']
                    } for row in data['result']["rows"]]
print(data)
# [{'querydate': '7/31/2019 3:19 PM', 
#     'issueid': 472683, 
#     'ticker': 'AAPL', 
#     'companyname': 'APPLE INC',
#     'issuetitle': 'COM',
#     'filerid': 1089387},
# {
#     'querydate': '7/31/2019 3:19 PM',
#     'issueid': 472683,
#     'ticker': 'AAPL',
#     'companyname': 'APPLE INC',
#     'issuetitle': 'COM',
#     'filerid': 1086893},
# {
#     'querydate': '7/31/2019 3:19 PM', 
#     'issueid': 472683, 
#     'ticker': 'AAPL', 
#     'companyname': 'APPLE INC', 
#     'issuetitle': 'COM', 
#     'filerid': 1085803
# }]

Once you have this dictionary, you can call from_dict method:

df = pd.DataFrame.from_dict(data)
print(df)
#   companyname  filerid  issueid issuetitle          querydate ticker
# 0   APPLE INC  1089387   472683        COM  7/31/2019 3:19 PM   AAPL
# 1   APPLE INC  1086893   472683        COM  7/31/2019 3:19 PM   AAPL
# 2   APPLE INC  1085803   472683        COM  7/31/2019 3:19 PM   AAPL

If you want to get the rownum as a column (or index):

data = [{**{field["field"]:field["value"] for field in row['values']}, **{'rownum': row["rownum"]}} for row in data['result']["rows"]]

df = pd.DataFrame.from_dict(data)
print(df)
#   companyname  filerid  issueid issuetitle          querydate  rownum ticker
# 0   APPLE INC  1089387   472683        COM  7/31/2019 3:19 PM       1   AAPL
# 1   APPLE INC  1086893   472683        COM  7/31/2019 3:19 PM       2   AAPL
# 2   APPLE INC  1085803   472683        COM  7/31/2019 3:19 PM       3   AAPL

Upvotes: 1

Related Questions