Bijan
Bijan

Reputation: 8670

Pandas Unpack List of Dicts to Columns

I have a dataframe that has a field called fields which is a list of dicts (all rows have the same format). Here is how the dataframe is structured:

formId    fields
   123    [{'number': 1, 'label': 'Last Name', 'value': 'Doe'}, {'number': 2, 'label': 'First Name', 'value': 'John'}]

I am trying to unpack the fields column so it looks like:

formId    Last Name    First Name
   123          Doe          John

The code I have currently is:

for i,r in df.iterrows():
    for field in r['fields']:
        df.at[i, field['label']] = field['value']

However this does not seem like the most efficient way. Is there a better way to accomplish this?

Upvotes: 5

Views: 96

Answers (3)

e-motta
e-motta

Reputation: 7530

Solution:

v = pd.concat([pd.json_normalize(x) for x in df["fields"]]).pivot_table(
    columns="label", values="value", aggfunc=list
)
out = df[["formId"]].join(v.explode(v.columns.tolist(), ignore_index=True))

Result using example input from OP:

   formId First Name Last Name
0     123       John       Doe

Also works for multiple rows. For this example input:

df = pd.DataFrame(
    {
        "formId": [123, 456],
        "fields": [
            [
                {"number": 1, "label": "Last Name", "value": "Doe"},
                {"number": 2, "label": "First Name", "value": "John"},
            ],
            [
                {"number": 1, "label": "Last Name", "value": "Smith"},
                {"number": 2, "label": "First Name", "value": "Jack"},
            ],
        ],
    }
)

Result:

   formId First Name Last Name
0     123       John       Doe
1     456       Jack     Smith

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195573

Personally, I'd construct new dataframe:

df = pd.DataFrame(
    [
        {"formId": form_id, **{f["label"]: f["value"] for f in fields}}
        for form_id, fields in zip(df["formId"], df["fields"])
    ]
)

print(df)

Prints:

   formId Last Name First Name
0     123       Doe       John

Upvotes: 2

Jason Baker
Jason Baker

Reputation: 3706

You can use .apply and .concat to convert the dicts to series. Finally .pivot to convert the column to headers.

Data:

import pandas as pd


data = {"formId": 123, "fields": [{'number': 1, 'label': 'Last Name', 'value': 'Doe'},
                                  {'number': 2, 'label': 'First Name', 'value': 'John'}]}
df = pd.DataFrame(data=data)

Code:

df = (pd
      .concat(objs=[df, df.pop("fields").apply(func=pd.Series)], axis=1)
      .pivot(index="formId", columns="label", values="value")
      .reset_index()
      .rename_axis(mapper=None, axis=1)
      )

print(df)

Output:

   formId First Name Last Name
0     123       John       Doe

Upvotes: 2

Related Questions