Nurbek Kuantyrov
Nurbek Kuantyrov

Reputation: 177

Split one row of data into multiple in Pandas

I have in my opinion a little tricky problem with tables in pandas. I have a Table of tags data and documents in below format :


                Tags    Docs
0   A1-0001-AA-001/002  Doc1
1   A1-0001-AA-003/004  Doc1
2   A1-0001-AA-006/009  Doc2
3   A1-111-BB-001A/B/C  Doc3
4   A1-222-BC-002A/B/C  Doc4
5   A1-222-BB-001/2/3   Doc4
6   A1-3333-CD-001/003  Doc6

Or if you want to copy the code:

d= {'Tags':['A1-0001-AA-001/002',
'A1-0001-AA-003/004',
'A1-0001-AA-006/009',
'A1-111-BB-001A/B/C',
'A1-222-BC-002A/B/C',
'A1-222-BB-001/2/3',
'A1-3333-CD-001/003',
], 'Docs': ['Doc1', 'Doc1','Doc2','Doc3','Doc4','Doc4','Doc6']}
df = pd.DataFrame(data=d)

And I would like to split the rows in "Tags" by delimiter "/" and group by document. The problem is that if I split by delimiter, the second part will be left out without the suffix and this suffix is different on each row. Is there a way to split those? Thanks in advance for any ideas!

Upvotes: 0

Views: 52

Answers (1)

jezrael
jezrael

Reputation: 862591

Solution working if same length of values after / per groups created from rows of original data.

First use Series.str.split with DataFrame.explode and then split values by lengths from first values per groups and add to all another values:

df = df.assign(Tags=df['Tags'].str.split('/')).explode('Tags')

m = df.index.duplicated()
s = df['Tags'].str.len().groupby(level=0).transform('last')

df['new'] = [x[:-y] for x, y in zip(df['Tags'], s)]
df['Tags'] = (df['Tags'].mask(m, df.pop('new')
                                   .groupby(level=0)
                                   .transform('first')
                                   .add(df['Tags'])))
df = df.reset_index(drop=True)
print (df)
              Tags  Docs
0   A1-0001-AA-001  Doc1
1   A1-0001-AA-002  Doc1
2   A1-0001-AA-003  Doc1
3   A1-0001-AA-004  Doc1
4   A1-0001-AA-006  Doc2
5   A1-0001-AA-009  Doc2
6   A1-111-BB-001A  Doc3
7   A1-111-BB-001B  Doc3
8   A1-111-BB-001C  Doc3
9   A1-222-BC-002A  Doc4
10  A1-222-BC-002B  Doc4
11  A1-222-BC-002C  Doc4
12   A1-222-BB-001  Doc4
13   A1-222-BB-002  Doc4
14   A1-222-BB-003  Doc4
15  A1-3333-CD-001  Doc6
16  A1-3333-CD-003  Doc6

Upvotes: 1

Related Questions