Reputation: 409
I am new to Regex. I want to match a certain URL pagePath pattern for Analytics.
The Problem:
The pattern looks like this:
/(de|en|fr|it)/../any-word-including-dashes/word-or-words-including-dashes-and-numbers
I want to match only this pattern and exclude all pagePathes with another forward slash or not matching the initial pattern:
Include:
/de/ab/word-word/word1-and-something-else
/de/ab/word-word/word1-and-something-else?any_ting1=any.-thing2
Exclude:
/de/ab/word-word/word1-and-something-else/
/de/ab/word-word/word1-and-something-else/anything
/de/ab/word-word
/fr/moreThanTwoCHAR/anything
My Regex:
After having searched on SO (Exclude forward slash before end , "Match anything but" and Finding exactly n occurences of "/", disallow 0 or more occurences of a CHAR) I came up with the following regex:
^(\/de|\/fr|\/en|\/it)\/..\/.+\/\w+[^\/]*
What it does correctly
It excludes correctly the following path:
/fr/moreThanTwoCHAR/anything
What it fails on
The problem with the above regex is that it matches also (tested on regex101):
/de/ab/word-word/word1-and-something-else/anything
And I can't seem to understand why it matches the string with an additional forward slash even if I stated to exclude 0 or more additional occurences (at least from what I understood). Anyone can explain me where I'm mistaken?
Upvotes: 1
Views: 339
Reputation: 626699
Note that .
matches any char (except line break chars if no DOTALL option (/s
) is used) thus your regex just matches more types of input than you need.
You may use
'~^/(de|fr|en|it)/[^/]{2}(?:/[^/]+){2}$~'
See the regex demo.
Pattern details:
^
- start of input/
- a /
char(de|fr|en|it)
- one of the three alternative substrings: de
, fr
, en
or it
/[^/]{2}
- /
and then any 2 chars other than /
(?:/[^/]+){2}
- 2 consecutive sequences of a /
and then 1+ chars other than /
$
- end of input.Upvotes: 1