jcoke
jcoke

Reputation: 1891

Unpack list of dictionary into separate columns in Pandas

Let's assume I have data which is structured like such:

{ 
    "_id" : 245, 
    "connId" : "3r34b32", 
    "roomList" : [
        {
            "reportId" : 29, 
            "siteId" : 1
        }]
}

How do I go about getting the keys from the dictionary within the list, along with the value in order for it to have this expected output below.

Expected Output:

ID,connID,reportId,siteId

245,3r34b32,29,1

I'm using json_normalize but this does not work with lists, so I need to add an extra bit of code which can unpack the list that contains a dictionary.

Any help will be appreciated.

Upvotes: 3

Views: 1067

Answers (1)

Akshay Sehgal
Akshay Sehgal

Reputation: 19307

You have a nested record. You can handle them separately with record_path and them concatenate them with pd.concat()

root = pd.json_normalize(d).drop('roomList',1)
nested = pd.json_normalize(d, record_path='roomList')
output = pd.concat([root, nested],axis=1)
print(output)
   _id   connId  reportId  siteId
0  245  3r34b32        29       1

Method without explicit column names

If you don't want to specify a column name explicitly you can do the following. This checks which of the columns after normalization are nested, separately normalizes them and concatenates all of them. I have added another nested JSON for sake of example -

root = pd.json_normalize(d)
nested_cols = [i for i in root.columns if isinstance(root[i][0], list)]

l = [root.drop(nested_cols,1),]
for i in nested_cols:
    l.append(pd.json_normalize(d, record_path=i))

output = pd.concat(l, axis=1)
print(output)
   _id   connId  reportId  siteId  reportId2  siteId2
0  245  3r34b32        29       1         39        5

NOTE: If some of your nested records have same names, you many want to use some prefix while normalizing them. Check this for more details.

Upvotes: 3

Related Questions