vizcayno
vizcayno

Reputation: 1233

Next regular expression construction does not work, why?

I have the next command-line option which I need to split in a name-value format:

-table tab -delimiter "," -limit:10

The regexp construction is:

(?<=[-{1,2}|/])(?<name>[a-zA-Z0-9_]*)[ |:|"]*(?<value>[0-9A-Za-z.?=&\|+ :'*(),\\]*)(?=[ |"]|$)

My problem is in the delimiter option, when I put -delimiter "|" or -delimiter ":", the regexp construction doesn't work in spite I included these characters, in those cases: name=delimiter, but value is equal to empty, why?

Thanks for your help.

Edit: Tim and Gabe, thanks for your help. The final construction that works is:

(?<=-{1,2}|/)(?<name>[a-zA-Z0-9_]*)\s*:?\s*"? *(?<value>[0-9A-Za-z.?=&\|+ :'*(),\\]*)(?=[ "]|$)

Upvotes: 0

Views: 76

Answers (2)

Gabe
Gabe

Reputation: 86768

Your problem is in [ |:|"]* -- it appears as though you think | means "or" within brackets even though you clearly don't use it in the next set of brackets.

You just want [ :"]* probably, which would make "|" work. Unfortunately that group matches any number of characters that can be a space, colon, or quote, which means ":" is all considered to be part of the space between the name and value. You will need to better define your allowed set of characters between the name and value.

I suggest: \s*:?\s*"? (any amount of space, followed by an optional colon, followed by any amount of space, followed by an optional quote).

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336418

There are a few errors in your regex:

[ |:|"]* matches zero or more of these characters: space, |, : or ". You seem to be using this to identify possible separators between name and value.

Of course, in -delimiter "|" or -delimiter ":" it matches all the characters after delimiter, leaving nothing for the value part. Since that is optional, the regex matches successfully without filling the value part.

Another thing:

You probably don't want

(?<=[-{1,2}|/])

but rather

(?<=-{1,2}|/)

Looks like you should read up on character classes.

To fix your regex, we need to know the rules you're trying to implement. What exactly can separate a name/value pair?

Upvotes: 2

Related Questions