Reputation: 51
In Python I am trying to split a column into multiple rows this
A B
ABC|XYZ|PQR 123
into
A B
ABC 123
XYZ 123
PQR 123
I have this code
df_wpipe = df[df['director'].str.contains("\|", na=False)]
df_wopipe = df[df["director"].str.contains("\|")==False]
for x in range(100):
df_a = df_wpipe.copy()
df_a['director'] = df_a['director'].str.split('\|').apply(lambda li: li[x] if len(li) == x+1 else None)
new_rows = df_wopipe.append(df_a)
if len(df_a.index)==0:
break
First str.contains("\|", na=False) seems not to be working because the df df_wpipe contains only 154 where is should have 500+ second the for loop seems not to append the rows. Could you please help. Thanks
Upvotes: 1
Views: 1108
Reputation: 294488
df.loc[df.index.repeat(df.A.str.count('\|') + 1)].assign(
A=np.concatenate(df.A.str.split('|')))
A B
0 ABC 123
0 XYZ 123
0 PQR 123
Without Numpy
df.loc[df.index.repeat(df.A.str.count('\|') + 1)].assign(
A='|'.join(df.A).split('|'))
If we name the index
we can use join
df.rename_axis('i', inplace=True)
df.A.str.split('|', expand=True).stack().to_frame('A').join(df.drop('A', 1))
A B
i
0 0 ABC 123
1 XYZ 123
2 PQR 123
Upvotes: 2
Reputation: 323326
This is unnest problem .
s=df.A.str.split('|')
pd.DataFrame({'A':s.sum(),'B':df.B.repeat(s.str.len())})
Out[22]:
A B
0 ABC 123
0 XYZ 123
0 PQR 123
Or try something new
df.set_index('B').A.str.get_dummies(sep='|').stack().reset_index().drop(0,1)
Out[29]:
B level_1
0 123 ABC
1 123 PQR
2 123 XYZ
Upvotes: 2