dellwarrior
dellwarrior

Reputation: 43

Regex which captures url prefix but excludes www

I've been trying to wrap my mind around regex usage in javascript (not an expert) but I've been unable to solve this issue.

This is a pattern for my url:

https://www.prefix.site.com

And my current regex:

/(?:(\w+)\.)?site\.com

What I need to do is capture the prefix that's before '.site', but I don't want to include the 'https://www.', given that both 'www.' and my prefix may or may not be present. An example of my prefix could be an environment, e.g. https://testing.site.com

The issue with the regex from above is that IF there is a 'www.' without my prefix, then it will capture the 'www.' as prefix and that's not what I need.

I kinda solved it with negative lookbehind, but since it's not available in javascript, I cannot use it.

Any tips would be really appreciated!

Upvotes: 4

Views: 147

Answers (3)

Grafluxe
Grafluxe

Reputation: 481

Per your needs, this expression will only capture the prefix: (?!w{1,3}\.)[\w-]+(?=\.example)

https://regex101.com/r/X4L9ZZ/2

It supports dashes as well as properly allowing "w"s in your prefix/sub-domain.

Sample:

const getPrefix = uri => {
  const matched = uri.match(/(?!w{1,3}\.)[\w-]+(?=\.example)/);
  return matched && matched[0];
}

getPrefix("https://www.prefix.example.com"); // "prefix"
getPrefix("https://prefix.example.com"); // "prefix"
getPrefix("https://www.example.com"); // null
getPrefix("https://example.com"); // null

Good news is that "lookbehinds" will soon be fully supported in JS. It's already at stage 4 and just needs to be implemented across browsers! https://github.com/tc39/proposal-regexp-lookbehind

Upvotes: 1

CertainPerformance
CertainPerformance

Reputation: 370699

At the very beginning of the capturing group, you can negative lookahead for www. to ensure that the capturing group will only match if it contains something other than www.:

((?!www\.)\b\w+\.)?site\.com

https://regex101.com/r/K8btgd/1

Note the word boundary \b - that's to make sure that the capturing group either starts after a non-word character (like a / or a .), or won't match at all (to prevent matches such as ww.site.com where a third w precedes it)

Upvotes: 2

Tim
Tim

Reputation: 2843

It sounds like the following would work for you:

https?://(?:w{3}\.)?(\w+)\.site\.com

Upvotes: 1

Related Questions