Sayantan Ghosh
Sayantan Ghosh

Reputation: 338

How to expand rows based on a particular row value in a dataframe?

I have a dataframe as follows:

Col1     Col2     Col3     Col4
AB       2i       2j|2k    2y
CD       3j       3k|3p|3e 3x

So, for those rows which have pipeline separated values, I want to expand the rows as follows: (This should be the final dataframe)

Col1     Col2     Col3     Col4
AB       2i       2j       2y
AB       2i       2k       2y
CD       3j       3k       3x
CD       3j       3p       3x
CD       3j       3e       3x

So, the pipeline separated values have to be expanded into their own rows and the other field values have to be copied in. How to do that in pandas dataframe?

Upvotes: 0

Views: 462

Answers (1)

jezrael
jezrael

Reputation: 862731

Use for pandas 0.25.0+ Series.str.split with DataFrame.assign for column filled of lists and then DataFrame.explode, last for default index DataFrame.reset_index with drop=True:

df = df.assign(Col3 = df['Col3'].str.split('|')).explode('Col3').reset_index(drop=True)
print (df)
  Col1 Col2 Col3 Col4
0   AB   2i   2j   2y
1   AB   2i   2k   2y
2   CD   3j   3k   3x
3   CD   3j   3p   3x
4   CD   3j   3e   3x

EDIT: If column name has space:

print (df)
  Col1 Col2    my col Col4
0   AB   2i     2j|2k   2y
1   CD   3j  3k|3p|3e   3x

df['my col'] = df['my col'].str.split('|')
df = df.explode('my col').reset_index(drop=True)
print (df)
  Col1 Col2 my col Col4
0   AB   2i     2j   2y
1   AB   2i     2k   2y
2   CD   3j     3k   3x
3   CD   3j     3p   3x
4   CD   3j     3e   3x

Solution for oldier versions:

c = df.columns
s = (df.pop('Col3')
       .str.split('|', expand=True)
       .stack()
       .reset_index(drop=True, level=1)
       .rename('Col3'))

df = df.join(s).reset_index(drop=True)[c]
print (df)
  Col1 Col2 Col3 Col4
0   AB   2i   2j   2y
1   AB   2i   2k   2y
2   CD   3j   3k   3x
3   CD   3j   3p   3x
4   CD   3j   3e   3x

Upvotes: 2

Related Questions