dease
dease

Reputation: 3076

Python - regexp to check if string is TLD domain

I have form's field which accept string representing polish domain name (ends with .pl)

I need to check:

  1. if string is a proper polish domain name (ends with .pl)
  2. domain is top-level ( ie: domainname.pl) or 2-nd level: domainname.net.pl, domainname.something.pl

Do you have any suggestion how such regexp should look like?

Upvotes: 0

Views: 1592

Answers (2)

Gajo
Gajo

Reputation: 66

If you really need a regex for that, i would go with something like this:

^([a-z0-9-]+.)?([a-z0-9-]+).pl$

Upvotes: 0

Martijn Pieters
Martijn Pieters

Reputation: 1121904

You cannot match all possible top-level domains with a regex, and the list of what is a TLD changes from time to time.

Use a library to extract the TLD instead, like tldextract or publicsuffix.

Demo:

>>> import tldextract
>>> tldextract.extract('domainname.net.pl')
ExtractResult(subdomain='', domain='domainname', suffix='net.pl')
>>> tldextract.extract('www.domainname.net.pl')
ExtractResult(subdomain='www', domain='domainname', suffix='net.pl')
>>> from publicsuffix import PublicSuffixList
>>> psl = PublicSuffixList()
>>> psl.get_public_suffix('domainname.net.pl')
'domainname.net.pl'
>>> psl.get_public_suffix('www.domainname.net.pl')
'domainname.net.pl'

tldextract gives you a parsed result, while publicsuffix merely gives you the domain name that still is considered one entity.

Upvotes: 4

Related Questions