Reputation: 901
i needed help. How do i get domain from a string?
For example: "Hi im Natsume, check out my site http://www.mysite.com/"
How do i get just mysite.com?
Output example:
http://www.mysite.com/ (if http entered)
www.mysite.com (if http not entered)
mysite.com (if both http and www not entered)
Upvotes: 1
Views: 8283
Reputation: 15
How about this?
url='https://www.google.com/'
var=url.split('//www.')[1]
domain=var[0:var.index('/')]
print(domain)
Upvotes: -1
Reputation: 31
Best way is to use regex to extract the URL. Then use tldextract
to get valid domain name from the URL.
import re
import tldextract
text = "Hi im Natsume, check out my site http://www.example.com/"
urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', text)
found_url = urls[0]
info = tldextract.extract(found_url)
domain_name = info.domain
suffix_name = info.suffix
final_domain_name = domain_name+"."+suffix_name
print(final_domain_name)
Upvotes: 0
Reputation: 2770
myString = "Hi im Natsume, check out my site http://www.mysite.com/"
>>> a = re.search("(?P<url>https?://[^\s]+)", myString) or re.search("(?P<url>www[^\s]+)", myString)
>>> a.group("url")
'http://www.mysite.com/'
>>> myString = "Hi im Natsume, check out my site www.mysite.com/"
>>> a = re.search("(?P<url>https?://[^\s]+)", myString) or re.search("(?P<url>www[^\s]+)", myString)
>>> a.group("url")
'www.mysite.com/'
Upvotes: 1
Reputation: 10170
If all the sites had the same format, you could use a regexp like this (which work in this specific case):
re.findall('http://www\.(\w+)\.com', url)
However you need a more complex regexp able to parse whichever url and extract the domain name.
Upvotes: 1
Reputation: 7867
If you want to use regular expression, one way could be -
>>> s = "Hi im Natsume, check out my site http://www.mysite.com/"
>>> re.findall(r'http\:\/\/www\.([a-zA-Z0-9\.-_]*)\/', s)
['mysite.com']
..considering url ends with '/'
Upvotes: 1
Reputation: 250941
s= "Hi im Natsume, check out my site http://www.mysite.com/"
start=s.find("http://") if s.find("http://")!=-1 else s.find("https://")+1
t = s[start+11:s.find(" ",start+11)]
print(t)
output:
mysite.com
Upvotes: 1
Reputation: 399813
Well ... You need some way to define what you consider to be something that has a "domain". One approach might be to look up a regular expression for URL-matching, and apply that to the string. If that succeeds, you at least know that the string holds a URL, and can continue to interpret the URL in order to look for a host name, from which you can then extract the domain (possibly).
Upvotes: 1