I would like to split text in a column for a pandas dataframe based on multiple Delimiters and create new rows for each

Question

Hello everyone below is my dataframe:

Below is my dataset:

Name      University            Subject              

John      Harvard               English, French
John      MIT                   Economics 
Alan      BU                    Data Science & Math

I would like to have the following output:

Name      University            Subject              

John      Harvard               English
John      Harvard               French
John      MIT                   Economics 
Alan      BU                    Data Science
Alan      BU                    Math

I have tried the code below:

df.drop('subject', axis=1).join(df['subject'].str.split(',', expand=True).stack().reset_index(level=1,drop=True).rename('subject'))

This works but only splits it according to ',' but I would also like to split it for '&'.

Please help me, I am generally new to python and am open to using all libraries like Pandas and NumPy.

I found the above solution on another Stackoverflow question, however, I do not fully understand the steps. Please explain the steps as clearly as possible.

Thanks :)

jtorca · Accepted Answer

You can use a regex expression in place of just ',' to include additional characters to split on. For example:

import pandas as pd

df = pd.DataFrame({'Name':['John', 'John', 'Alan', 'Joe'], 
'University':['Harvard', 'MIT', 'BU', 'NYU'], 
'Subject':['English, French', 'Economics', 'Data Science & Math', 
'Economics and French']})

df = df.drop('Subject', axis=1).join(df['Subject'].str.split(',|&|and', expand=True).stack().reset_index(level=1,drop=True).rename('Subject'))

# remove extra white space
df['Subject'] = df['Subject'].str.strip()
df

   Name University       Subject
0  John    Harvard       English
0  John    Harvard        French
1  John        MIT     Economics
2  Alan         BU  Data Science
2  Alan         BU          Math
3   Joe        NYU     Economics
3   Joe        NYU        French

I would like to split text in a column for a pandas dataframe based on multiple Delimiters and create new rows for each

Answers (1)

Related Questions