sigur7
sigur7

Reputation: 936

Extract domain name with no domain extension in Google Sheets

I am trying to extract the domain name only from a cell in Google Sheets. The cell can have either a .com or .co.uk extension.

They are always presented as naked domains with www or https://.

Column A will contain a list of random URLs that will be displayed as aol.com or amazon.co.uk.

I have tried the following and many varients, I think essentially this is "remove everything BEFORE the dot" and regex actually uses a dot to perform its duty. Therefore, that is what is perhaps causing a potential conflict.

=REGEXEXTRACT(A4, "(.+).")

The result will eventually be piped into a QUERY where the company name is checked.

=query(Companies!A2:F,"select A where F contains '"&A2&"'")

Upvotes: 1

Views: 1030

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626927

Since you may have only .com or .co.uk at the end of the strings, you may use

=REGEXEXTRACT(A4, "^(.+)\.(?:co\.uk|com)$")

See the regex demo.

Also, you may remove them at the end with

=REGEXREPLACE(A4, "\.(?:co\.uk|com)$", "")

See another regex demo

You may also consider a bit more generic patterns like

=REGEXEXTRACT(A4, "^(.+?)(?:\.co)?\.[^.]+$")
=REGEXREPLACE(A4, "(?:\.co)?\.[^.]+$", "")

Pattern details

  • ^ - start of string
  • (.+) - 1 or more chars other than line break chars, as many as possible
  • (.+?) - 1 or more chars other than line break chars, as few as possible (needed in the more generic patterns because the subsequent pattern is optional)
  • \.(?:co\.uk|com)$ - . and then co.uk or com at the end of the string
  • (?:\.co)?\.[^.]+$ - an optional .co char sequence and then . and 1 or more chars other than a . till the end of the string.

Upvotes: 2

Related Questions