Reputation: 936
I am trying to extract the domain name only from a cell in Google Sheets. The cell can have either a .com
or .co.uk
extension.
They are always presented as naked domains with www
or https://
.
Column A will contain a list of random URLs that will be displayed as aol.com
or amazon.co.uk
.
I have tried the following and many varients, I think essentially this is "remove everything BEFORE the dot" and regex actually uses a dot to perform its duty. Therefore, that is what is perhaps causing a potential conflict.
=REGEXEXTRACT(A4, "(.+).")
The result will eventually be piped into a QUERY where the company name is checked.
=query(Companies!A2:F,"select A where F contains '"&A2&"'")
Upvotes: 1
Views: 1030
Reputation: 626927
Since you may have only .com
or .co.uk
at the end of the strings, you may use
=REGEXEXTRACT(A4, "^(.+)\.(?:co\.uk|com)$")
See the regex demo.
Also, you may remove them at the end with
=REGEXREPLACE(A4, "\.(?:co\.uk|com)$", "")
You may also consider a bit more generic patterns like
=REGEXEXTRACT(A4, "^(.+?)(?:\.co)?\.[^.]+$")
=REGEXREPLACE(A4, "(?:\.co)?\.[^.]+$", "")
Pattern details
^
- start of string(.+)
- 1 or more chars other than line break chars, as many as possible(.+?)
- 1 or more chars other than line break chars, as few as possible (needed in the more generic patterns because the subsequent pattern is optional)\.(?:co\.uk|com)$
- .
and then co.uk
or com
at the end of the string(?:\.co)?\.[^.]+$
- an optional .co
char sequence and then .
and 1 or more chars other than a .
till the end of the string.Upvotes: 2