Reputation: 31
I have a data frame where each row represent a full name and a website. I need to split that into 2 columns: name and website.
I've tried to use pandas str.split but I'm struggling to create a regex pattern that catches any initial 'http' plus the rest of the website. I have websites starting with http and https.
df = pd.DataFrame([['John Smith http://website.com'],['Alan Delon https://alandelon.com']])
I want to have a pattern that correctly identify the website to split my data. Any help would be very much appreciated.
Upvotes: 0
Views: 143
Reputation: 3770
using str.split
pd.DataFrame(df[0].str.split('\s(?=http)').tolist()).rename({0:'Name',1:'Website'}, axis=1)
Output
Name Website
0 John Smith http://website.com
1 Alan Delon https://alandelon.com
Upvotes: 1
Reputation: 82765
Using str.extract
Ex:
df = pd.DataFrame([['John Smith http://website.com'],['Alan Delon https://alandelon.com']], columns=["data"])
df[["Name", "Url"]] = df["data"].str.extract(r"(.*?)(http.*)")
print(df)
Output:
data Name Url
0 John Smith http://website.com John Smith http://website.com
1 Alan Delon https://alandelon.com Alan Delon https://alandelon.com
Upvotes: 0