Reputation: 8670
I have a dataframe that has a field called fields
which is a list of dicts (all rows have the same format). Here is how the dataframe is structured:
formId fields
123 [{'number': 1, 'label': 'Last Name', 'value': 'Doe'}, {'number': 2, 'label': 'First Name', 'value': 'John'}]
I am trying to unpack the fields
column so it looks like:
formId Last Name First Name
123 Doe John
The code I have currently is:
for i,r in df.iterrows():
for field in r['fields']:
df.at[i, field['label']] = field['value']
However this does not seem like the most efficient way. Is there a better way to accomplish this?
Upvotes: 5
Views: 96
Reputation: 7530
Solution:
v = pd.concat([pd.json_normalize(x) for x in df["fields"]]).pivot_table(
columns="label", values="value", aggfunc=list
)
out = df[["formId"]].join(v.explode(v.columns.tolist(), ignore_index=True))
Result using example input from OP:
formId First Name Last Name
0 123 John Doe
Also works for multiple rows. For this example input:
df = pd.DataFrame(
{
"formId": [123, 456],
"fields": [
[
{"number": 1, "label": "Last Name", "value": "Doe"},
{"number": 2, "label": "First Name", "value": "John"},
],
[
{"number": 1, "label": "Last Name", "value": "Smith"},
{"number": 2, "label": "First Name", "value": "Jack"},
],
],
}
)
Result:
formId First Name Last Name
0 123 John Doe
1 456 Jack Smith
Upvotes: 2
Reputation: 195573
Personally, I'd construct new dataframe:
df = pd.DataFrame(
[
{"formId": form_id, **{f["label"]: f["value"] for f in fields}}
for form_id, fields in zip(df["formId"], df["fields"])
]
)
print(df)
Prints:
formId Last Name First Name
0 123 Doe John
Upvotes: 2
Reputation: 3706
You can use .apply and .concat to convert the dicts to series. Finally .pivot to convert the column to headers.
Data:
import pandas as pd
data = {"formId": 123, "fields": [{'number': 1, 'label': 'Last Name', 'value': 'Doe'},
{'number': 2, 'label': 'First Name', 'value': 'John'}]}
df = pd.DataFrame(data=data)
Code:
df = (pd
.concat(objs=[df, df.pop("fields").apply(func=pd.Series)], axis=1)
.pivot(index="formId", columns="label", values="value")
.reset_index()
.rename_axis(mapper=None, axis=1)
)
print(df)
Output:
formId First Name Last Name
0 123 John Doe
Upvotes: 2