Reputation: 23
I need to validate a file path. One of the directories can have a version number in it.
Following are the two possible kinds of path that I may encounter.
Path 1
File path = "/a/b/c/d_9000/p1=<val1>/p2=<val2>/p3=<val3>/<val4>"
Expected Output
Group 1 = d
Group 2 = 9000
Group 3 = val1
Group 4 = val2
Group 5 = val3
Group 6 = val4
Path 2
File Path = "/a/b/c/d/p1=<val1>/p2=<val2>/p3=<val3>/<val4>"
Expected Output
Group 1 = d
Group 2 = <null or empty string>
Group 3 = val1
Group 4 = val2
Group 5 = val3
Group 6 = val4
When each of these file paths is parsed, I need the above values in each group
Following is what I have tried
\/a\/b\/c\/(\w+)_([0-9]+)\/p1=(.*)\/p2=(.*)\/p3=(.*)\/(.*)
But this does not give me the right values for Group1 & Group 2
I tried adding the '?' after the underscore, but that does not help either.
Please help
Upvotes: 1
Views: 24
Reputation: 626845
The problem is that \w
matches letters, digits or _
. It is quantified with +
, a greedy quantifier, and thus making the subsequent adjoining pattern optional match an empty string before a non-matching text. (\w+)_?([0-9]+)\/
will grab all letter, digits, _
up to the /
in d_9000/
, and only the last 0
will land in Group 3 since [0-9]+
should match at least 1 digit.
You may exclude a _
from \w
using [^\W_]
and make the _([0-9]+)
pattern optional by wrapping it with an optional non-capturing group:
\/a\/b\/c\/([^\W_]+)(?:_([0-9]+))?\/p1=(.*)\/p2=(.*)\/p3=(.*)\/(.*)
^^^^^^^ ^^^ ^^
See this regex demo.
Or, make \w
lazy rather than subtracting _
from \w
(if there can be _
other than the one before the digits you need to capture):
\/a\/b\/c\/(\w*?)(?:_([0-9]+))?\/p1=(.*)\/p2=(.*)\/p3=(.*)\/(.*)
^^^^
See another regex demo.
Upvotes: 1