Reputation: 9
I have a list of URLs in formats such as "www.blah.com/en-us" and I need to cut-off anything after the "www.blah.com". I've tried using the following:
import re
website = www.blah.com/en-us
cleanURL = re.sub('(.|\n)*?com', "", website)
Output: 'en-us'
So I'm getting the opposite of what I want. Sorry if this post isn't correctly formatted, first time asking a question.
Upvotes: 0
Views: 57
Reputation: 432
How about just using
website = "www.blah.com/en-us"
cleanURL = website.split("/",1)[0]
?
Upvotes: 4
Reputation: 603
Is using regex a must? If there's no protocol (e.g. http://) in the URLs that you're trying to process, you could just use your_url_string.split('/', 1)[0]
which should split on the first instance of '/' and gives you the part before the split.
Upvotes: 2