Cy.
Cy.

Reputation: 2145

Shell script to extract domain extension from a list of domainnames

I have a list of URLS (including http://), where some are just domain names and some others include full path.

How could I programmatically using shell scripting, extract the extension (.com, .net...), taking in consideration that some extensions are .co.uk for example?

Upvotes: 0

Views: 647

Answers (2)

Shizzmo
Shizzmo

Reputation: 16907

Essentially you'd need a list of everything you're considering a "TLD" There are a finite number of these. Then for each URL, you'd see if anything in your list matches that URL, and if so, print it out. The reason you need to construct the list yourself is that .co.uk is not a TLD. .uk is the TLD and .co is a subdomain.

Or you could construct an enormously long regex (for example, extracting .co.uk, .com, .ca, .biz):

$ perl -ne 'next unless /^http:\/\/[^ \/?]+(\.com|\.co\.uk|\.ca|\.biz)/; print $1, "\n"'

Upvotes: 2

Hai Vu
Hai Vu

Reputation: 40733

The most robust way is to use a library to parse the url. For example, in Python:

from urlparse import urlparse
domain = urlparse('http://www.mydomain.co.uk/path/to/file.html').netloc
tld = domain.split('.')[-1]
print tld

will prints out just the net location (or what I think you meant TLD in this case)

UPDATE: prints the TLD this time, instead of the whole domain.

Upvotes: 2

Related Questions