Hacendado
Hacendado

Reputation: 81

create dataframe columns from list of dictionaries of the column

I have a dataframe in which one column I have a list of dictionaries, and in this dictionaries I has the name of the of the columns I want to create and its value.

    id  stats                                                            opta_id
0  1307  [{name: 'speed', value: 5},{name: 'strength', value: 10}....]   p176278
1  2410  [{name: 'vision', value: 5}, {name: 'strength', value: 10}....] p118335
2   200  [{name: 'speed', value: 5},{name: 'vision', value: 10}....]     p92187
3  3314  [{name: 'speed', value: 5},{name: 'strength', value: 10}....]   p154976
4  9223  [{name: 'speed', value: 5},{name: 'strength', value: 10}....]   p446990

the list can have up to 80 elements and the length of it is different on each row.

How could flat this column in order to get something similar to this?

    id  stats.speed   stats.strength   stats.vision     .....              opta_id
0  1307  5              10                nan           .....              p176278
1  2410  nan            5                 10            .....              p118335
.
.
.

thank you!

Upvotes: 0

Views: 87

Answers (2)

Hacendado
Hacendado

Reputation: 81

At the end I have found a solution to my problem. First I created at temporal dataframe with every row of the column

tmp = pd.concat([pd.DataFrame(x) for x in df['stats']], keys=df.index).reset_index(level=1, drop=True)

afterwards, I do pivot_table with the 'name' column and using as value the stat

pivot = pd.pivot_table(tmp, columns='name', index=df_stats.index.values, values='stat')

Upvotes: 0

Serge Ballesta
Serge Ballesta

Reputation: 148870

Here I would first build a temporary dataframe from a list of dict created from the stats column, and then concat it with the remaining columns:

tmp = pd.DataFrame([{d['name']: d['value'] for d in row}
                    for row in df['stats']]).rename(
                        columns=lambda x: 'stats.' + x)

df = pd.concat([df['id'], tmp, df['opta_id']], axis=1)

With the shown data, it gives:

     id  stats.speed  stats.strength  stats.vision  opta_id
0  1307          5.0            10.0           NaN  p176278
1  2410          NaN            10.0           5.0  p118335
2   200          5.0             NaN          10.0   p92187
3  3314          5.0            10.0           NaN  p154976
4  9223          5.0            10.0           NaN  p446990

Upvotes: 1

Related Questions