Reputation: 1
I have a Google spreadsheet where I have lists of URLs. I have formula to extract the domains from the URLs. But the issues is when a URL has multiple names in the domain. For example
I have attached the link to a sample doc and the tow formulae that I tried. These two formulas work perfectly in certain format and not on some other cases. If there is way to club these two or some way to understand the URL format and choose the best formula to extract the domain would be good. I tried by couldn't achieve the desired output. Google sheet link is given below.
Upvotes: 0
Views: 737
Reputation: 1131
You can get by with just one formula, REGEXEXTRACT
First, we extract the hostname from the url. To do that, we use the following formula:
=REGEXEXTRACT(A2:A,"(?:www\.)?([\w._\-]{6,})")
Now, we extract the domain from the hostname. You can do it like this:
=REGEXEXTRACT(...hostname... ,"[\w_\-]+\.\w{0,4}\.?\w{0,4}$")
And now we build everything into a single array formula:
=ARRAYFORMULA(if(A2:A<>"",REGEXEXTRACT(REGEXEXTRACT(A2:A,"(?:www\.)?([\w._\-]{6,})"),"[\w_\-]+\.\w{0,4}\.?\w{0,4}$"),))
I don't pretend to be the best solution to your task - and probably someone will be able to tell you something simpler.
Upvotes: 1