Jai Mani
Jai Mani

Reputation: 1

Google Sheet Formula To Extract Domain From Different Format Site URLs

I have a Google spreadsheet where I have lists of URLs. I have formula to extract the domains from the URLs. But the issues is when a URL has multiple names in the domain. For example

I have attached the link to a sample doc and the tow formulae that I tried. These two formulas work perfectly in certain format and not on some other cases. If there is way to club these two or some way to understand the URL format and choose the best formula to extract the domain would be good. I tried by couldn't achieve the desired output. Google sheet link is given below.

Sample google sheet

Upvotes: 0

Views: 737

Answers (1)

Sergey
Sergey

Reputation: 1131

You can get by with just one formula, REGEXEXTRACT

First, we extract the hostname from the url. To do that, we use the following formula:

=REGEXEXTRACT(A2:A,"(?:www\.)?([\w._\-]{6,})")

Now, we extract the domain from the hostname. You can do it like this:

=REGEXEXTRACT(...hostname... ,"[\w_\-]+\.\w{0,4}\.?\w{0,4}$")

And now we build everything into a single array formula:

=ARRAYFORMULA(if(A2:A<>"",REGEXEXTRACT(REGEXEXTRACT(A2:A,"(?:www\.)?([\w._\-]{6,})"),"[\w_\-]+\.\w{0,4}\.?\w{0,4}$"),))

I don't pretend to be the best solution to your task - and probably someone will be able to tell you something simpler.

Upvotes: 1

Related Questions