Ayush Kumar
Ayush Kumar

Reputation: 532

How do I extract the domain name?

I am trying to extract the domain name from various websites. Here are the websites:

1. "www.xakep.ru"  should equal "xakep"

2. "http://www.fk3vmxex20vzn4ddp.info/default.html" should equal "fk3vmxex20vzn4ddp"

3. "https://hxin2wz7bkx9oicndd28y6m6i7n.us/img/" should equal "hxin2wz7bkx9oicndd28y6m6i7n"

4. "iccan.org" should equal "iccan"

5. "0iwb0awri.br/warez/" should equal "0iwb0awri"

6. "http://www.google.com/" should equal "google"

My code:

import re
url = "www.xakep.ru"
regex = re.compile(r'(://|www.)+([a-zA-Z-_0-9]+)')
match = regex.search(url)
print(match.group(2))

I am having problem in string without http or www in them.

Upvotes: 1

Views: 116

Answers (2)

anubhava
anubhava

Reputation: 785128

You may use this regex with 2 optional matches:

^(?:https?://)?(?:www\.)?([^.]+)

RegEx Demo

RegEx Details:

  • ^: Start
  • (?:https?://)?: optionally match http:// or https://
  • (?:www\.)?: optionally match www.
  • ([^.]+): Match 1+ of any character that is not a DOT in capture group #1

Upvotes: 2

Christof
Christof

Reputation: 9

I know that you asked for using RE for that, but normally I'd not recommend to do such thing "manually", because it is easy to get it wrong.

The function you are looking for is in python's urllib and should provide everything you want: https://docs.python.org/3/library/urllib.parse.html

When you get the hostname from the urlsplit function, getting the domain name from that is much easier than trying to parse any URL. But then, I might be lazy here.

Upvotes: 0

Related Questions