Reputation: 169
I have dataset like this:
data = {'id': ['1','2'],
'seq': ['1, 2, 001','2, 5, 4, 5, 8, 009']}
new_df = pd.DataFrame(data)
Output:
id seq
0 1 1, 2, 001
1 2 2, 5, 4, 5, 8, 009
I want to get:
new_data = {'id': ['1', '1','2','2','2','2','2'],
'seq': ['1, 001','1, 2, 001','2, 009','2, 5, 009','2, 5, 4, 009','2, 5, 4, 5, 009','2, 5, 4, 5, 8, 009']}
new_df = pd.DataFrame(new_data)
Output:
id seq
0 1 1, 001
1 1 1, 2, 001
2 2 2, 009
3 2 2, 5, 009
4 2 2, 5, 4, 009
5 2 2, 5, 4, 5, 009
6 2 2, 5, 4, 5, 8, 009
I started from explode:
df.assign(seq=df.seq.str.split(',\s*')).explode('seq')
And now have no idea how continue. I will be glad to your comments
Upvotes: 3
Views: 71
Reputation: 862511
Use nested list comprehension with add last value and join
, then create new columns and explode
:
a=[[', '.join(x[:i]+[x[-1]]) for i,y in enumerate(x[:-1],1)] for x in df.seq.str.split(',\s*')]
df = df.assign(seq=a).explode('seq')
print (df)
id seq
0 1 1, 001
0 1 1, 2, 001
1 2 2, 009
1 2 2, 5, 009
1 2 2, 5, 4, 009
1 2 2, 5, 4, 5, 009
1 2 2, 5, 4, 5, 8, 009
Alternative solution:
data = {'id': ['1','2', '3'],
'seq': ['1, 2, 001','2, 5, 4, 5, 8, 009', '1']}
df = pd.DataFrame(data)
print (df)
id seq
0 1 1, 2, 001
1 2 2, 5, 4, 5, 8, 009
2 3 1
a = [[', '.join(x[:i]+x[-1:]) for i,y in enumerate(x[:-1],1)]
if len(x) > 1 else x for x in df.seq.str.split(',\s*')]
df = df.assign(seq=a).explode('seq')
print (df)
id seq
0 1 1, 001
0 1 1, 2, 001
1 2 2, 009
1 2 2, 5, 009
1 2 2, 5, 4, 009
1 2 2, 5, 4, 5, 009
1 2 2, 5, 4, 5, 8, 009
2 3 1
Upvotes: 2
Reputation: 5451
you can use str.split
and then apply
and explode
function like below
data = {'id': ['1','2'],
'seq': ['1, 2, 001','2, 5, 4, 5, 8, 009']}
new_df = pd.DataFrame(data)
new_df['seq'] = new_df.seq.str.split(",").apply(lambda arr: [','.join(arr[:i] + arr[-1:]) for i in range(1,len(arr))])
new_df.explode('seq')
Upvotes: 2