Reputation: 155
I converted a JSON into DataFrame
and ended up with a column 'Structure_value' having below values as list of dictionary/dictionaries:
Structure_value
[{'Room': 6, 'Length': 7}, {'Room': 6, 'Length': 7}]
[{'Room': 6, 'Length': 22}]
[{'Room': 6, 'Length': 8}, {'Room': 6, 'Length': 9}]
Since it is an object so I guess it ended up in this format.
I need to split it into below four columns:
Structure_value_room_1
Structure_value_length_1
Structure_value_room_2
Structure_value_length_2
All other solutions on StackOverflow only deal with converting Simple JSON into DataFrame and not the nested structure.
P.S.: I know I can do something by explicitly naming fields but I need a generic solution so that in future any JSON of this format can be handled
[Edit]: The output should look like this:
Structure_value_room_1 Structure_value_length_1 Structure_value_room_2 \
0 6 7 6.0
1 6 22 NaN
2 6 8 6.0
Structure_value_length_2
0 7.0
1 NaN
2 9.0
Upvotes: 3
Views: 982
Reputation: 862551
Use list comprehension with nested dictionary comprehension with enumerate for deduplicate keys of dicts, last pass list of dictionaries to DataFrame
constructor:
L = [ {f"{k}_{i}": v for i, y in enumerate(x, 1)
for k, v in y.items()}
for x in df["Structure_value"] ]
df = pd.DataFrame(L)
print(df)
Room_1 Length_1 Room_2 Length_2
0 6 7 6.0 7.0
1 6 22 NaN NaN
2 6 8 6.0 9.0
For columns names from question use:
def json_to_df(df, column):
L = [ {f"{column}_{k.lower()}_{i}": v for i, y in enumerate(x, 1)
for k, v in y.items()}
for x in df[column] ]
return pd.DataFrame(L)
df1 = json_to_df(df, 'Structure_value')
print(df1)
Structure_value_room_1 Structure_value_length_1 Structure_value_room_2 \
0 6 7 6.0
1 6 22 NaN
2 6 8 6.0
Structure_value_length_2
0 7.0
1 NaN
2 9.0
Upvotes: 4
Reputation: 168903
A non-Pandas solution you can probably apply to your original JSON data, here represented by rows
:
import pprint
rows = [
{"Foo": "1", "Structure": [{'Room': 6, 'Length': 7}, {'Room': 6, 'Length': 7}]},
{"Foo": "2", "Structure": [{'Room': 6, 'Length': 22}]},
{"Foo": "3", "Structure": [{'Room': 6, 'Length': 8}, {'Room': 6, 'Length': 9}]},
]
for row in rows: # Modifies `rows` in-place
for index, room_info in enumerate(row.pop("Structure", ()), 1):
for key, value in room_info.items():
row[f"Structure_value_{key.lower()}_{index}"] = value
pprint.pprint(rows)
outputs
[{'Foo': '1',
'Structure_value_length_1': 7,
'Structure_value_length_2': 7,
'Structure_value_room_1': 6,
'Structure_value_room_2': 6},
{'Foo': '2', 'Structure_value_length_1': 22, 'Structure_value_room_1': 6},
{'Foo': '3',
'Structure_value_length_1': 8,
'Structure_value_length_2': 9,
'Structure_value_room_1': 6,
'Structure_value_room_2': 6}]
Upvotes: 0