JS Regex: Parse urls with conditions

Question

I had a requirement of parsing a set of urls and extract specific elements from urls under special conditions. To explain it further, consider a set of urls:

http://www.example.com/appName1/some/extra/parts/keyword/rest/of/the/url http://www.somewebsite.com/appName2/some/extra/parts/keyword/rest/of/the/url http://www.someothersite.com/appname3/rest/of/the/url

As you can see, there are two sets of urls, one having the word "keyword" in it and others which don't. In my code, I will receive the part of the url after domain name (eg: /appName1/some/extra/parts/keyword/rest/of/the/url).

I have two tasks, one check if the word "keyword" is present in the url, and second, to be done only if "keyword" is not present in url, parse the url to fetch the two groups as the appName and rest of the url (eg: grp 1. appName3 and grp 2. rest/of/the/url for url 3, as it doesn't have "keyword" in it). The whole thing should be done in one regex.

My progress:

I was able to parse the app name and rest of the url into groups, but was not able to apply the condition.
I found out a way to select stings not having "keyword" in it, I'm not sure if it's the right way to do it:^((?!.\*keyword).\*)$
Next, to combine the above two, I tried something I found after a long search, which has syntax (?(?=regex)then|else) Reference. And the result was :
```
(?(?=^((?!.*keyword).*)$)\1)
```
But it says invalid group structure.

I had gone through many stackoverflow entries and tutorials, but couldn't reach the actual requirement. Please help me solve this.

Mathias-S · Accepted Answer

Yes, this is in fact possible. As far as I understand, you have the following cases:

/appName/some/extra/parts/keyword/rest/of/the/url
/appName/rest/of/the/url

You want your regex to not match the first one at all, while in the second case you want "appName" in one group and "rest/of/the/url" in another. The following regex will do that:

^(?!.*\/keyword\/)\/(.*?)\/(.*)$

Explanation:

^ assert position at the start of the string`
(?!.*\/keyword\/) is a negative lookahead, and looks ahead to make sure the string does not contain /keyword/. This is where the magic happens
\/ matches "/", i.e. the slash right after the domain name
(.*?)\/ captures the first group (appname in your example) greedily until next slash
(.*)$ is the group that captures "rest/of/the/url"

JS Regex: Parse urls with conditions

Answers (1)

Related Questions