Reputation: 338
I have a dataframe as follows:
Col1 Col2 Col3 Col4
AB 2i 2j|2k 2y
CD 3j 3k|3p|3e 3x
So, for those rows which have pipeline separated values, I want to expand the rows as follows: (This should be the final dataframe)
Col1 Col2 Col3 Col4
AB 2i 2j 2y
AB 2i 2k 2y
CD 3j 3k 3x
CD 3j 3p 3x
CD 3j 3e 3x
So, the pipeline separated values have to be expanded into their own rows and the other field values have to be copied in. How to do that in pandas dataframe?
Upvotes: 0
Views: 462
Reputation: 862731
Use for pandas 0.25.0+ Series.str.split
with DataFrame.assign
for column filled of lists and then DataFrame.explode
, last for default index DataFrame.reset_index
with drop=True
:
df = df.assign(Col3 = df['Col3'].str.split('|')).explode('Col3').reset_index(drop=True)
print (df)
Col1 Col2 Col3 Col4
0 AB 2i 2j 2y
1 AB 2i 2k 2y
2 CD 3j 3k 3x
3 CD 3j 3p 3x
4 CD 3j 3e 3x
EDIT: If column name has space:
print (df)
Col1 Col2 my col Col4
0 AB 2i 2j|2k 2y
1 CD 3j 3k|3p|3e 3x
df['my col'] = df['my col'].str.split('|')
df = df.explode('my col').reset_index(drop=True)
print (df)
Col1 Col2 my col Col4
0 AB 2i 2j 2y
1 AB 2i 2k 2y
2 CD 3j 3k 3x
3 CD 3j 3p 3x
4 CD 3j 3e 3x
Solution for oldier versions:
c = df.columns
s = (df.pop('Col3')
.str.split('|', expand=True)
.stack()
.reset_index(drop=True, level=1)
.rename('Col3'))
df = df.join(s).reset_index(drop=True)[c]
print (df)
Col1 Col2 Col3 Col4
0 AB 2i 2j 2y
1 AB 2i 2k 2y
2 CD 3j 3k 3x
3 CD 3j 3p 3x
4 CD 3j 3e 3x
Upvotes: 2