Camilo
Camilo

Reputation: 169

Regex to extract a part of an URL

I'm trying to extract the part of an URL ignoring the http(s)://www. part of it.

These URLs come from a form that the user fills and multiple formats and errors are expected, here's a sample:

http://www.akashicbooks.com 
https://deliciouselsalvador.com
http://altaonline.com
http://https://www.amtb-la.org/
http://https://www.amovacations.com/
http://dornsife.usc.edu/jep

I've tried in Google Sheets and Airtable using the REGEXEXTRACT formula:

=REGEXEXTRACT({URL},"[^/]+$")

But unfortunately, I can't make it work for all the cases:

Any ideas on how to make it work?

Upvotes: 1

Views: 555

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

You can use

^(?:https?://(?:www\.)?)*(.*)

See the regex demo. Details:

  • ^ - start of string
  • (?:https?://(?:www\.)?)* - zero or more occurrences of
    • https?:// - http:// or https://
    • (?:www\.)? - an optional sequence of www.
  • (.*) - Group 1: the rest of the string.

With REGEXEXTRACT, the output value is the text captured with Group 1.

Upvotes: 1

Related Questions