jay76
jay76

Reputation: 25

Regular expressions: need to validate specific string format

I need to validate a semi-colon separated string:

Example:

;XYZ;2;200;event18=2.5;eVar12=Sale
  1. The opening semi-colon must be present.
  2. The 'XYZ' section is mandatory and can be any word or whitespace characters of any length.
  3. The '2' element is mandatory and must be numeric of unknown length.
  4. The '200' section is mandatory and must be numeric of unknown length.
  5. The 'event18=2.5' is optional. If present, the event number will always be a 1 or 2 digit number, and after the = sign will be a number of unknown length and might contain a decimal point.
  6. The 'eVar12=Sale' is optional. If present, the eVar number will always be a 1 or 2 digit number, and after the = sign will be any combination of word characters and white-space.

I've been banging away at this for a few hours now, but I'm quite the regex newb and I suspect the answer is fairly complex? Any help would be greatly appreciated.

Upvotes: 0

Views: 86

Answers (3)

Jerry
Jerry

Reputation: 71538

You might try something like this:

^;[A-Za-z ]+(?:;[0-9]+){2}(?:;event[1-9][0-9]?=[0-9]+(?:\.[0-9]+)?)?(?:;eVar[1-9][0-9]?=[A-Za-z ]+)?$

regex101 demo

But if you meant 'word character' as in letter, number and underscore as depicted by \w, then you can use:

^;[\w ]+(?:;[0-9]+){2}(?:;event[1-9][0-9]?=[0-9]+(?:\.[0-9]+)?)?(?:;eVar[1-9][0-9]?=[\w ]+)?$

Upvotes: 1

kol
kol

Reputation: 28678

The pattern:

^;([\w\s]+);(\d+);(\d+)(?:;event(\d{1,2})=(\d+(?:.\d*)))?(?:;eVar(\d{1,2})=([\w\s]+))?$

JavaScript example:

var regex = /^;([\w\s]+);(\d+);(\d+)(?:;event(\d{1,2})=(\d+(?:.\d*)))?(?:;eVar(\d{1,2})=([\w\s]+))?$/,
    input = ";XYZ;2;200;event18=2.5;eVar12=Sale";

console.log(input.match(regex));

Upvotes: 1

Birei
Birei

Reputation: 36252

I would use a CSV parser to split fields and check each one individually, but as alternative, here we go:

The regex is not hard at all. Basic knowledge of it can take you to the finish line.

To match any type of characters, as in point two, use the separator negated in a character class, like [^;]+.

Numbers are \d, with its appropiate cuantifier, like *, +, or {...}.

And for optional points, surround them in parentheses and the optional metacharacter (?:...)?

It results in (python version):

re.match(r';[^;]+;\d+;\d+;(?:event\d{1,2}=(?:\d+\.)?\d+)?;(?:eVar\d{1,2}=.*)?', string)

It should work, but if not, now you are ready to adapt it to your needs.

Upvotes: 0

Related Questions