IceBurger
IceBurger

Reputation: 155

How to convert nested json structure to dataframe

I converted a JSON into DataFrame and ended up with a column 'Structure_value' having below values as list of dictionary/dictionaries:

                   Structure_value
[{'Room': 6, 'Length': 7}, {'Room': 6, 'Length': 7}]
[{'Room': 6, 'Length': 22}]
[{'Room': 6, 'Length': 8}, {'Room': 6, 'Length': 9}]

Since it is an object so I guess it ended up in this format.

I need to split it into below four columns:

Structure_value_room_1
Structure_value_length_1
Structure_value_room_2
Structure_value_length_2

All other solutions on StackOverflow only deal with converting Simple JSON into DataFrame and not the nested structure.

P.S.: I know I can do something by explicitly naming fields but I need a generic solution so that in future any JSON of this format can be handled

[Edit]: The output should look like this:

   Structure_value_room_1  Structure_value_length_1  Structure_value_room_2  \
0                       6                         7                     6.0   
1                       6                        22                     NaN   
2                       6                         8                     6.0   

   Structure_value_length_2  
0                       7.0  
1                       NaN  
2                       9.0  

Upvotes: 3

Views: 982

Answers (2)

jezrael
jezrael

Reputation: 862551

Use list comprehension with nested dictionary comprehension with enumerate for deduplicate keys of dicts, last pass list of dictionaries to DataFrame constructor:

L = [ {f"{k}_{i}": v for i, y in enumerate(x, 1) 
                     for k, v in y.items()}
                     for x in df["Structure_value"] ]
df = pd.DataFrame(L)
print(df)

   Room_1  Length_1  Room_2  Length_2
0       6         7     6.0       7.0
1       6        22     NaN       NaN
2       6         8     6.0       9.0

For columns names from question use:

def json_to_df(df, column):

    L = [ {f"{column}_{k.lower()}_{i}": v for i, y in enumerate(x, 1) 
                         for k, v in y.items()}
                         for x in df[column] ]
    return pd.DataFrame(L)


df1 = json_to_df(df, 'Structure_value')
print(df1)
   Structure_value_room_1  Structure_value_length_1  Structure_value_room_2  \
0                       6                         7                     6.0   
1                       6                        22                     NaN   
2                       6                         8                     6.0   

   Structure_value_length_2  
0                       7.0  
1                       NaN  
2                       9.0  

Upvotes: 4

AKX
AKX

Reputation: 168903

A non-Pandas solution you can probably apply to your original JSON data, here represented by rows:

import pprint

rows = [
    {"Foo": "1", "Structure": [{'Room': 6, 'Length': 7}, {'Room': 6, 'Length': 7}]},
    {"Foo": "2", "Structure": [{'Room': 6, 'Length': 22}]},
    {"Foo": "3", "Structure": [{'Room': 6, 'Length': 8}, {'Room': 6, 'Length': 9}]},
]

for row in rows:  # Modifies `rows` in-place
    for index, room_info in enumerate(row.pop("Structure", ()), 1):
        for key, value in room_info.items():
            row[f"Structure_value_{key.lower()}_{index}"] = value

pprint.pprint(rows)

outputs

[{'Foo': '1',
  'Structure_value_length_1': 7,
  'Structure_value_length_2': 7,
  'Structure_value_room_1': 6,
  'Structure_value_room_2': 6},
 {'Foo': '2', 'Structure_value_length_1': 22, 'Structure_value_room_1': 6},
 {'Foo': '3',
  'Structure_value_length_1': 8,
  'Structure_value_length_2': 9,
  'Structure_value_room_1': 6,
  'Structure_value_room_2': 6}]

Upvotes: 0

Related Questions