Reputation: 103
Have this:
items, name
0 { [{'a': 2, 'b': 1}, {'a': 4, 'b': 3}], this }
1 { [{'a': 2, 'b': 1}, {'a': 4, 'b': 3}], that }
But would like to have the list of dictionary objects exploded into (flattened?) into actual rows like this:
a, b, name
0 { 2, 1, this}
1 { 4, 3, this}
0 { 2, 1, that}
1 { 4, 3, that}
Having been trying to use melt
but with no luck, any ideas? suggestions?
Data to produce DataFrame:
data = {'items': [[{'a': 2, 'b': 1}, {'a': 4, 'b': 3}], [{'a': 2, 'b': 1}, {'a': 4, 'b': 3}]], 'name': ['this', 'that']}
Upvotes: 10
Views: 10619
Reputation: 1021
More general approach for similar situations
explode
items to separate rowsjson_normalize
'ed columns and drop the original items columnIn [1]: import pandas as pd
In [2]: data = {'items': [[{'a': 2, 'b': 1}, {'a': 4, 'b': 3}], [{'a': 2, 'b': 1}, {'a': 4, 'b': 3}]], 'name': ['this', 'that']}
In [3]: df = pd.DataFrame(data).explode('items')
In [4]: df
Out[4]:
items name
0 {'a': 2, 'b': 1} this
0 {'a': 4, 'b': 3} this
1 {'a': 2, 'b': 1} that
1 {'a': 4, 'b': 3} that
In [5]: df = df.reset_index(drop=True) # align source table and items
In [6]: df
Out[6]:
items name
0 {'a': 2, 'b': 1} this
1 {'a': 4, 'b': 3} this
2 {'a': 2, 'b': 1} that
3 {'a': 4, 'b': 3} that
In [7]: pd.json_normalize(df['items']) # just to illustrate what is happen
Out[7]:
a b
0 2 1
1 4 3
2 2 1
3 4 3
In [8]: df.join(
...: pd.json_normalize(df['items']), # "explode" dict to separate columns
...: rsuffix='_right' # just in case if you have overlapping column names
...: ).drop(columns=['items']) # delete original column
Out[8]:
name a b
0 this 2 1
0 this 2 1
1 that 4 3
1 that 4 3
Upvotes: 2
Reputation:
Another solution is to set_index
with "name" and explode
"items". Then cast the resulting Series to a DataFrame.
s = df.set_index('name')['items'].explode()
out = pd.DataFrame(s.tolist(), index=s.index).reset_index()
Output:
name a b
0 this 2 1
1 this 4 3
2 that 2 1
3 that 4 3
It appears, set_index
+ explode
+ DataFrame
is faster (at least for OP's data) than all the other options given in the other answers.
%timeit -n 1000 out = pd.concat(df['items'].apply(pd.DataFrame).tolist(), keys=df["name"]).reset_index()
%timeit -n 1000 ab = pd.DataFrame.from_dict(np.concatenate(df['items']).tolist()); lens = df['items'].str.len(); rest = df.drop('items', axis=1).iloc[df.index.repeat(lens)].reset_index(drop=True); out = ab.join(rest)
%timeit -n 1000 out = pd.concat([pd.DataFrame(df1.iloc[0]) for x,df1 in df.groupby('name')['items']],keys=df.name).reset_index().drop('level_1',axis=1)
%timeit -n 1000 s = df.set_index('name')['items'].explode(); out = pd.DataFrame(s.tolist(), index=s.index).reset_index()
2.5 ms ± 29.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.75 ms ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.82 ms ± 433 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.46 ms ± 68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Upvotes: 4
Reputation: 375475
Another way to use concat
perhaps more cleanly:
In [11]: pd.concat(df.group.apply(pd.DataFrame).tolist(), keys=df["name"])
Out[11]:
a b
name
this 0 2 1
1 4 3
that 0 2 1
1 4 3
In [12]: pd.concat(df.group.apply(pd.DataFrame).tolist(),
keys=df["name"]).reset_index(level="name")
Out[12]:
name a b
0 this 2 1
1 this 4 3
0 that 2 1
1 that 4 3
Upvotes: 7
Reputation: 323226
pd.concat([pd.DataFrame(df1.iloc[0]) for x,df1 in df.groupby('name').group],keys=df.name)\
.reset_index().drop('level_1',1)
Out[63]:
name a b
0 this 2 1
1 this 4 3
2 that 2 1
3 that 4 3
Data Input
df = pd.DataFrame({ "group":[[{'a': 2, 'b': 1}, {'a': 4, 'b': 3}],[{'a': 2, 'b': 1}, {'a': 4, 'b': 3}]],
"name": ['this', 'that']})
Upvotes: 3
Reputation: 1
tmp_list = list()
for index, row in a.iterrows():
for list_item in row['items']:
tmp_list.append(dict(list_item.items()+[('name', row['name'])]))
pd.DataFrame(tmp_list)
a b name
0 2 1 this
1 4 3 this
2 2 1 that
3 4 3 that
Upvotes: 0
Reputation: 294258
ab = pd.DataFrame.from_dict(np.concatenate(df['items']).tolist())
lens = df['items'].str.len()
rest = df.drop('items', 1).iloc[df.index.repeat(lens)].reset_index(drop=True)
ab.join(rest)
a b name
0 2 1 this
1 4 3 this
2 2 1 that
3 4 3 that
Upvotes: 3