Arvind I
Arvind I

Reputation: 23

Regular Expression for Optional characters

I need to validate a file path. One of the directories can have a version number in it.

Following are the two possible kinds of path that I may encounter.

Path 1

File path = "/a/b/c/d_9000/p1=<val1>/p2=<val2>/p3=<val3>/<val4>"

Expected Output

Group 1 = d
Group 2 = 9000
Group 3 = val1
Group 4 = val2
Group 5 = val3
Group 6 = val4

Path 2

File Path = "/a/b/c/d/p1=<val1>/p2=<val2>/p3=<val3>/<val4>"

Expected Output

Group 1 = d
Group 2 = <null or empty string>
Group 3 = val1
Group 4 = val2
Group 5 = val3
Group 6 = val4

When each of these file paths is parsed, I need the above values in each group

Following is what I have tried

\/a\/b\/c\/(\w+)_([0-9]+)\/p1=(.*)\/p2=(.*)\/p3=(.*)\/(.*)

But this does not give me the right values for Group1 & Group 2

I tried adding the '?' after the underscore, but that does not help either.

Please help

Upvotes: 1

Views: 24

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

The problem is that \w matches letters, digits or _. It is quantified with +, a greedy quantifier, and thus making the subsequent adjoining pattern optional match an empty string before a non-matching text. (\w+)_?([0-9]+)\/ will grab all letter, digits, _ up to the / in d_9000/, and only the last 0 will land in Group 3 since [0-9]+ should match at least 1 digit.

You may exclude a _ from \w using [^\W_] and make the _([0-9]+) pattern optional by wrapping it with an optional non-capturing group:

\/a\/b\/c\/([^\W_]+)(?:_([0-9]+))?\/p1=(.*)\/p2=(.*)\/p3=(.*)\/(.*)
            ^^^^^^^ ^^^         ^^   

See this regex demo.

Or, make \w lazy rather than subtracting _ from \w (if there can be _ other than the one before the digits you need to capture):

\/a\/b\/c\/(\w*?)(?:_([0-9]+))?\/p1=(.*)\/p2=(.*)\/p3=(.*)\/(.*)
            ^^^^

See another regex demo.

enter image description here

Upvotes: 1

Related Questions