Reputation: 1426
I have to below dataframe:
import pandas as pd
a = pd.DataFrame([{"name": "John",
"item" : "item1||item2||item3",
"itemVal" : "item1Val||item2Val||item3Val"},
{"name" : "Tom",
"item":"item4",
"itemVal" : "item4Val"
}
])
The dataframe is like this:
name item itemVal
John item1||item2||item3 item1Val||item2Val||item3Val
Tom item4 item4Val
I want to explode the row into multiple rows so that it will be like this (note that the item
and its itemVal
need to match).
name item itemVal
John item1 item1Val
John item2 item2Val
John item3 item3Val
Tom item4 item4Val
I have looked at other answers here:
Split (explode) pandas dataframe string entry to separate rows
pandas: How do I split text in a column into multiple rows?
But the works on only one column. How do I make it work on multiple columns? I'm using Pandas 1.0.1 and Python 3.8
Upvotes: 3
Views: 1331
Reputation: 28644
A combination of zip, product and chain can achieve the split into rows. Since this involves strings, and more importantly no numerical computation, you should get faster speeds in Python, than running it in Pandas:
from itertools import product,chain
combine = chain.from_iterable
#pair item and itemval columns
merge = zip(df.item,df.itemVal)
#pair the entires from the splits of item and itemval
merge = [zip(first.split("||"),last.split("||")) for first, last in merge]
#create a Cartesian product with the name column
merger = [product([ent],cont) for ent, cont in zip(df.name,merge)]
#create your exploded values
res = [(ent,*cont) for ent, cont in combine(merger)]
pd.DataFrame(res,columns=['name','item','itemVal'])
name item itemVal
0 John item1 item1Val
1 John item2 item2Val
2 John item3 item3Val
3 Tom item4 item4Val
Upvotes: 1
Reputation: 1356
This may not be as fast as the answer Sammywemmy suggested, nonetheless here is a generic function which works using Pandas functions. Note that explode function works only on one column at a time. So:
df = pd.DataFrame({'A': [1, 2], 'B': [['a','b'], ['c','d']], 'C': [['z','y'], ['x','w']]})
A B C
--------------
1 [a, b] [z, y]
2 [c, d] [x, w]
##Logic for multi-col explode
list_cols = {'B','C'}
other_cols = list(set(df.columns) - set(list_cols))
exploded = [df[col].explode() for col in list_cols]
df2 = pd.DataFrame(dict(zip(list_cols, exploded)))
df2 = df[other_cols].merge(df2, how="right", left_index=True, right_index=True)
A B C
------
1 a z
1 b y
2 c x
2 d w
Upvotes: 0
Reputation: 195418
a = a.apply(lambda x: [v.split('||') for v in x]).apply(pd.Series.explode)
print(a)
Prints:
name item itemVal
0 John item1 item1Val
0 John item2 item2Val
0 John item3 item3Val
1 Tom item4 item4Val
EDIT: If you want to split only selected columns, you can do:
exploded = a[['item', 'itemVal']].apply(lambda x: [v.split('||') for v in x]).apply(pd.Series.explode)
print( pd.concat([a['name'], exploded], axis=1) )
Upvotes: 4