Reputation: 139
Hello everyone below is my dataframe:
Below is my dataset:
Name University Subject
John Harvard English, French
John MIT Economics
Alan BU Data Science & Math
I would like to have the following output:
Name University Subject
John Harvard English
John Harvard French
John MIT Economics
Alan BU Data Science
Alan BU Math
I have tried the code below:
df.drop('subject', axis=1).join(df['subject'].str.split(',', expand=True).stack().reset_index(level=1,drop=True).rename('subject'))
This works but only splits it according to ',' but I would also like to split it for '&'.
Please help me, I am generally new to python and am open to using all libraries like Pandas and NumPy.
I found the above solution on another Stackoverflow question, however, I do not fully understand the steps. Please explain the steps as clearly as possible.
Thanks :)
Upvotes: 0
Views: 39
Reputation: 1551
You can use a regex expression in place of just ','
to include additional characters to split on. For example:
import pandas as pd
df = pd.DataFrame({'Name':['John', 'John', 'Alan', 'Joe'],
'University':['Harvard', 'MIT', 'BU', 'NYU'],
'Subject':['English, French', 'Economics', 'Data Science & Math',
'Economics and French']})
df = df.drop('Subject', axis=1).join(df['Subject'].str.split(',|&|and', expand=True).stack().reset_index(level=1,drop=True).rename('Subject'))
# remove extra white space
df['Subject'] = df['Subject'].str.strip()
df
Name University Subject
0 John Harvard English
0 John Harvard French
1 John MIT Economics
2 Alan BU Data Science
2 Alan BU Math
3 Joe NYU Economics
3 Joe NYU French
Upvotes: 1