Reputation: 10419
My data-structure is:
ds = [{
"name": "groupA",
"subGroups": [123,456]
},
{
"name": "groupB",
"subGroups": ['aaa', 'bbb' , 'ccc']
}]
This gives the following dataframe
df = pd.DataFrame(ds)
name subGroups
0 groupA [123, 456]
1 groupB [aaa, bbb, ccc]
I want:
name subGroupsFlattend
0 groupA 123
1 groupA 456
2 groupB aaa
3 groupB bbb
4 groupB ccc
Any ideas?
Upvotes: 10
Views: 7133
Reputation: 1083
YOBEN_S solution, but much more efficient for big dataframes.
from itertools import chain
pd.DataFrame({'name':df.name.repeat(df.subGroups.str.len()),
'subGroup':list(chain.from_iterable(df.subGroups.to_list()))})
Upvotes: 0
Reputation: 862741
You can use json_normalize
:
from pandas.io.json import json_normalize
df = json_normalize(ds, ['subGroups'], 'name').rename(columns={0:'subGroupsFlattend'})
print (df)
subGroupsFlattend name
0 123 groupA
1 456 groupA
2 aaa groupB
3 bbb groupB
4 ccc groupB
Alternative solution with flattening dictionaries:
L = [y for x in ds for y in zip(x["subGroups"], [x["name"]] * len(x["subGroups"]))]
print (L)
[(123, 'groupA'), (456, 'groupA'), ('aaa', 'groupB'), ('bbb', 'groupB'), ('ccc', 'groupB')]
df = pd.DataFrame(L, columns=['subGroupsFlattend','name'])
print (df)
subGroupsFlattend name
0 123 groupA
1 456 groupA
2 aaa groupB
3 bbb groupB
4 ccc groupB
EDIT:
from itertools import chain
df = pd.DataFrame(ds)
df1 = pd.DataFrame({
'subGroups' : list(chain.from_iterable(df['subGroups'].tolist())),
'name' : df['name'].values.repeat(df['subGroups'].str.len())
})
print (df1)
name subGroups
0 groupA 123
1 groupA 456
2 groupB aaa
3 groupB bbb
4 groupB ccc
Upvotes: 3
Reputation: 323276
You can fix your output by following :
pd.DataFrame({'name':df.name.repeat(df.subGroups.str.len()),'subGroup':df.subGroups.sum()})
Out[364]:
name subGroup
0 groupA 123
0 groupA 456
1 groupB aaa
1 groupB bbb
1 groupB ccc
Upvotes: 5