Ghoul Fool
Ghoul Fool

Reputation: 6949

Regex improvement match

I need some help to improve a regex! In JavaScript I have a regular expression which looks for pairs of numbers in a filename

var nums = str.match(/[\d]{1,}[\d]{1,}/gi);

This will match

with (1200,627) I have tried to improve the reg ex, just incase there are more than two pairs of numbers, to look for the following number(1 digit or more) + whitspace(1 or more) + x (zero or once) + whitspace(1 or more) + number(1 digit or more)

Which should fail on the second example (using a 'y' instead on an 'x'), which I thought would be:

[\d]{1,}[\s]?[x]?[\s]?[\d]{1,}

but it grabs all the digits in

with (1200,627,01) whereas I only want the first two numbers. I've written the code to deal only with the first two, but I was wondering where I was going wrong. Only a level 17 regex wizard can save me now! Thanks

Upvotes: 2

Views: 53

Answers (3)

Xophmeister
Xophmeister

Reputation: 9211

You say you want "one or more" whitespace characters between the "x", but you have used the ? quantifier which means "zero or one". Thus, because you've also marked the "x" as optional, it will match any two-or-more digit number: Your first [\d]{1,} will match against 0 then your second one will match on 1.

Note that you do not need to enclose single atoms into a character range: [\d] can be more simply written as \d. Also {1,} -- meaning "one or more" -- is more easily encoded as +.

As you want "one or more" whitespace character on either side of the "x", I would go with:

\d+(?:(?:\s+x\s+)|\s+)\d+

Note that (?: ... ) is a "non-capture group", so these bits won't form part of your match array. However, I don't think you want "one or more" whitespace character, as that won't match your first example. Instead, try this:

\d+(?:(?:\s*x\s*)|\s+)\d+

Where the * quantifier means "zero-or-more".

Upvotes: 0

skamazin
skamazin

Reputation: 757

I used \d+\s?x?\s?\d+ as my regex (same thing just replacing + for {1,} and removing the unnecessary []). You can see the outcome of it here.

The reason it's matching the 01 is because of all the ?. So it's matching the first /d+ (1 digit: 0), and then 0 of \s, 0 of x, and 0 of \s followed by \d+ (another 1 digit: 1)

The regex

(\d+)(?:\s?x\s?|\s)(\d+)

should do the trick. Test it here

(?:...) is a non-capture group. So it allows alternation while not assigning a back reference to it. This part matches the characters in between the two numbers (either has an x or a <space>).

Upvotes: 1

hsz
hsz

Reputation: 152206

Just try with following regex:

(\d+)(?:(?: ?x ?)| )(\d+)

demo

Upvotes: 0

Related Questions