Reputation: 8067
I'm new to programming and Python.
Background
My program accepts a url. I want to extract the username from the url.
The username is the subdomain. If the subdomain is 'www', the username should be the main part of the domain. The rest of the domain should be discard (eg. '.com/', '.org/')
I've tried the following:
def get_username_from_url(url):
if url.startswith(r'http://www.'):
user = url.replace(r'http://www.', '', 1)
user = user.split('.')[0]
return user
elif url.startswith(r'http://'):
user = url.replace(r'http://', '', 1)
user = user.split('.')[0]
return user
easy_url = "http://www.httpwwwweirdusername.com/"
hard_url = "http://httpwwwweirdusername.blogger.com/"
print get_username_from_url(easy_url)
# output = httpwwwweirdusername (good! expected.)
print get_username_from_url(hard_url)
# output = weirdusername (bad! username should = httpwwwweirdusername)
I've tried many other combinations using strip()
, split()
, and replace()
.
Could you advise me on how to solve this relatively simple problem?
Upvotes: 0
Views: 63
Reputation:
Possible to do this with regular expressions (could probably modify the regex to be more accurate/efficient).
import re
url_pattern = re.compile(r'.*/(?:www.)?(\w+)')
def get_username_from_url(url):
match = re.match(url_pattern, url)
if match:
return match.group(1)
easy_url = "http://www.httpwwwweirdusername.com/"
hard_url = "http://httpwwwweirdusername.blogger.com/"
print get_username_from_url(easy_url)
print get_username_from_url(hard_url)
Which yields us:
httpwwwweirdusername
httpwwwweirdusername
Upvotes: 0
Reputation: 474281
There is a module called urlparse
that is specifically for the task:
>>> from urlparse import urlparse
>>> url = "http://httpwwwweirdusername.blogger.com/"
>>> urlparse(url).hostname.split('.')[0]
'httpwwwweirdusername'
In case of http://www.httpwwwweirdusername.com/
it would output www
which is not desired. There are workarounds to ignore www
part, like, for example, get the first item from the splitted hostname
that is not equal to www
:
>>> from urlparse import urlparse
>>> url = "http://www.httpwwwweirdusername.com/"
>>> next(item for item in urlparse(url).hostname.split('.') if item != 'www')
'httpwwwweirdusername'
>>> url = "http://httpwwwweirdusername.blogger.com/"
>>> next(item for item in urlparse(url).hostname.split('.') if item != 'www')
'httpwwwweirdusername'
Upvotes: 4