Gratous
Gratous

Reputation: 51

Create Multiple New rows Based on Pipe-Delimited Column in Pandas

In Python I am trying to split a column into multiple rows this

       A                    B
  ABC|XYZ|PQR              123

into

    A                       B
   ABC                     123 
   XYZ                     123
   PQR                     123

I have this code

df_wpipe = df[df['director'].str.contains("\|", na=False)]
df_wopipe = df[df["director"].str.contains("\|")==False]
for x in range(100):
    df_a = df_wpipe.copy()
    df_a['director'] =  df_a['director'].str.split('\|').apply(lambda li: li[x] if len(li) == x+1 else None)
    new_rows = df_wopipe.append(df_a)
    if len(df_a.index)==0:
        break

First str.contains("\|", na=False) seems not to be working because the df df_wpipe contains only 154 where is should have 500+ second the for loop seems not to append the rows. Could you please help. Thanks

Upvotes: 1

Views: 1108

Answers (2)

piRSquared
piRSquared

Reputation: 294488

df.loc[df.index.repeat(df.A.str.count('\|') + 1)].assign(
    A=np.concatenate(df.A.str.split('|')))

     A    B
0  ABC  123
0  XYZ  123
0  PQR  123

Without Numpy

df.loc[df.index.repeat(df.A.str.count('\|') + 1)].assign(
    A='|'.join(df.A).split('|'))

If we name the index we can use join

df.rename_axis('i', inplace=True)
df.A.str.split('|', expand=True).stack().to_frame('A').join(df.drop('A', 1))

       A    B
i            
0 0  ABC  123
  1  XYZ  123
  2  PQR  123

Upvotes: 2

BENY
BENY

Reputation: 323326

This is unnest problem .

s=df.A.str.split('|')
pd.DataFrame({'A':s.sum(),'B':df.B.repeat(s.str.len())})
Out[22]: 
     A    B
0  ABC  123
0  XYZ  123
0  PQR  123

Or try something new

df.set_index('B').A.str.get_dummies(sep='|').stack().reset_index().drop(0,1)
Out[29]: 
     B level_1
0  123     ABC
1  123     PQR
2  123     XYZ

Upvotes: 2

Related Questions