Reputation: 169
I'm trying to extract the part of an URL ignoring the http(s)://www.
part of it.
These URLs come from a form that the user fills and multiple formats and errors are expected, here's a sample:
http://www.akashicbooks.com
https://deliciouselsalvador.com
http://altaonline.com
http://https://www.amtb-la.org/
http://https://www.amovacations.com/
http://dornsife.usc.edu/jep
I've tried in Google Sheets and Airtable using the REGEXEXTRACT
formula:
=REGEXEXTRACT({URL},"[^/]+$")
But unfortunately, I can't make it work for all the cases:
Any ideas on how to make it work?
Upvotes: 1
Views: 555
Reputation: 626845
You can use
^(?:https?://(?:www\.)?)*(.*)
See the regex demo. Details:
^
- start of string(?:https?://(?:www\.)?)*
- zero or more occurrences of
https?://
- http://
or https://
(?:www\.)?
- an optional sequence of www.
(.*)
- Group 1: the rest of the string.With REGEXEXTRACT
, the output value is the text captured with Group 1.
Upvotes: 1