Natsume
Natsume

Reputation: 901

Get domain from string? - Python

i needed help. How do i get domain from a string?

For example: "Hi im Natsume, check out my site http://www.mysite.com/"

How do i get just mysite.com?

Output example:

http://www.mysite.com/ (if http entered)

www.mysite.com (if http not entered)

mysite.com (if both http and www not entered)

Upvotes: 1

Views: 8283

Answers (7)

Jairam Jidgekar
Jairam Jidgekar

Reputation: 15

How about this?

url='https://www.google.com/'

var=url.split('//www.')[1]

domain=var[0:var.index('/')]

print(domain)

Upvotes: -1

Ayush Kurlekar
Ayush Kurlekar

Reputation: 31

Best way is to use regex to extract the URL. Then use tldextract to get valid domain name from the URL.

import re
import tldextract

text = "Hi im Natsume, check out my site http://www.example.com/"
urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', text)
found_url = urls[0]
info = tldextract.extract(found_url)
domain_name = info.domain
suffix_name = info.suffix
final_domain_name  = domain_name+"."+suffix_name
print(final_domain_name)

Upvotes: 0

shiva
shiva

Reputation: 2770

myString = "Hi im Natsume, check out my site http://www.mysite.com/"
>>> a = re.search("(?P<url>https?://[^\s]+)", myString) or re.search("(?P<url>www[^\s]+)", myString)
>>> a.group("url")
'http://www.mysite.com/'
>>> myString = "Hi im Natsume, check out my site www.mysite.com/"
>>> a = re.search("(?P<url>https?://[^\s]+)", myString) or re.search("(?P<url>www[^\s]+)", myString)
>>> a.group("url")
'www.mysite.com/'

Upvotes: 1

user278064
user278064

Reputation: 10170

If all the sites had the same format, you could use a regexp like this (which work in this specific case):

re.findall('http://www\.(\w+)\.com', url)

However you need a more complex regexp able to parse whichever url and extract the domain name.

Upvotes: 1

theharshest
theharshest

Reputation: 7867

If you want to use regular expression, one way could be -

>>> s = "Hi im Natsume, check out my site http://www.mysite.com/"
>>> re.findall(r'http\:\/\/www\.([a-zA-Z0-9\.-_]*)\/', s)
['mysite.com']

..considering url ends with '/'

Upvotes: 1

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250941

s= "Hi im Natsume, check out my site http://www.mysite.com/"
start=s.find("http://") if s.find("http://")!=-1 else s.find("https://")+1
t = s[start+11:s.find(" ",start+11)]
print(t)

output: mysite.com

Upvotes: 1

unwind
unwind

Reputation: 399813

Well ... You need some way to define what you consider to be something that has a "domain". One approach might be to look up a regular expression for URL-matching, and apply that to the string. If that succeeds, you at least know that the string holds a URL, and can continue to interpret the URL in order to look for a host name, from which you can then extract the domain (possibly).

Upvotes: 1

Related Questions