A M
A M

Reputation: 11

how to split the data in a column based on multiple delimiters, into multiple columns, in pandas

I have a dataframe with only one column named 'ALL_category[![enter image description here][1]][1]'. There are multiple names in a row ranging between 1 to 3 and separated by delimiters '|', '||' or '|||', which can be either at the beginning, in between or end of the words in every row. I want to split the column into multiple columns such that the new columns contain the names. How can I do it?

Below is the code to generate the dataframe:

x = {'ALL Categories': ['Rakesh||Ramesh|','||Rajesh|','HARPRIT|||','Tushar||manmit|']}
df = pd.DataFrame(x)

When I used the below code for modification of the above dataframe, it didn't give me any result.

data = data.ALL_HOLDS.str.split(r'w', expand = True)

Upvotes: 0

Views: 81

Answers (1)

jezrael
jezrael

Reputation: 862511

I believe you need Series.str.extractall if want each word to separate column:

df1 = df['ALL Categories'].str.extractall(r'(\w+)')[0].unstack()
print (df1)
match        0       1
0       Rakesh  Ramesh
1       Rajesh     NaN
2      HARPRIT     NaN
3       Tushar  manmit

Or a bit changed code of @Chris A from comments with Series.str.strip and Series.str.split by one or more |:

df1 = df['ALL Categories'].str.strip('|').str.split(r'\|+', expand=True)
print (df1)
         0       1
0   Rakesh  Ramesh
1   Rajesh    None
2  HARPRIT    None
3   Tushar  manmit

Upvotes: 1

Related Questions