Tim Molendijk
Tim Molendijk

Reputation: 1046

Regular expression does not match what I would expect it to match

Consider the following Javascript regular expression matching operation:

"class1 MsoClass2\tmsoclass3\t MSOclass4 msoc5".match(/(^|\s)mso.*?(\s|$)/ig);

I would expect it to return [" MsoClass2\t", "\tmsoclass3\t", " MSOclass4 ", " msoc5"]. Instead it returns [" MsoClass2\t", " MSOclass4 "].

Why?

Upvotes: 0

Views: 349

Answers (5)

Guffa
Guffa

Reputation: 700212

Because the first match consumes the tab character, so there is no white space character left before the second MSO string. Same with the space after the second match.

Perhaps you want to match word boundaries instead of the separating characters. This code:

"class1 MsoClass2\tmsoclass3\t MSOclass4 msoc5".match(/\bmso.*?\b/ig)

will give you this result:

["MsoClass2","msoclass3","MSOclass4","msoc5"]

Upvotes: 2

Gumbo
Gumbo

Reputation: 655189

The tabulator character before msoclass3 is already consumed by the first match " MsoClass2\t". Maybe you want to use a non-consuming look-ahead assertion instead:

/(^|\s)mso[^\s]*(?=\s|$)/

Upvotes: 2

Pascal MARTIN
Pascal MARTIN

Reputation: 400932

I am not sure you can use something like (^|\s) and (\s|$), first -- maybe you can, but I have to thikn to understand the regex -- and it's never good when someone has to think to understand a regex : those are often quite too complicated :-(


If you want to match words that begins by "mso", be it upper or lowercase, I'd probably use something like this :

"class1 MsoClass2\tmsoclass3\t MSOclass4 msoc5".match(/\s?(mso[^\s]*)\s?/ig);

Which gets you :

[" MsoClass2 ", "msoclass3 ", " MSOclass4 ", "msoc5"]

Which is (almost : there are a couple white-spaces differences) what you asked.

Or, even simpler :

"class1 MsoClass2\tmsoclass3\t MSOclass4 msoc5".match(/(mso[^\s]*)/ig);

Which gets you :

["MsoClass2", "msoclass3", "MSOclass4", "msoc5"]

Whithout aby whitespace.


More easy to read / understand, too ;-)

Upvotes: 0

KJ Saxena
KJ Saxena

Reputation: 21828

This is becaue you are using ^ OR \s(whitespace) for first match while the string has NO whitespace for class 3. To get the results you want, use the following inside match():

/mso.*?(\s|$)/ig

Upvotes: 0

Simon Nickerson
Simon Nickerson

Reputation: 43159

Because once it's matched " MsoClass2\t", the matcher is looking at the m in msoclass3, which doesn't match the initial space.

Upvotes: 0

Related Questions