Reputation: 51
I'm using Node.js for a project, and I'm finding Javascript's regex syntax very limiting. Specifically the lack of lookbehind is killing me. I'm trying to use regex to parse strings into sentences, but I want to check for common abbreviations such as Mr. and Mrs. so that I don't break the sentences up. Is their a Node.js library that adds regex features, and if not what would a good course of action be?
Upvotes: 5
Views: 1851
Reputation: 2118
The Node.js is based on the v8 engine and it's regex engine is the a part of v8. The v8 project is hosted here: https://code.google.com/p/v8/. The regex engine comes from this file: https://code.google.com/p/v8/source/browse/trunk/src/ia32/regexp-macro-assembler-ia32.cc?r=4966. You could in principal fork the project and add the desired features. I suspect this would be more effort than it is worth.
Regular expressions are generally not designed for parsing. There are many parsing libraries for Node.js that can be found here: https://npmjs.org/search?q=language+parsing. I can personally recommend hot-cocoa (https://github.com/olleicua/hot-cocoa) as I made it myself and it worked perfectly for my purposes.
Finally if your goal is just to match any single word or two words if the first one is 'Mr' or 'Mrs' then something like this might work:
var text = 'Mr Potter and Mrs Smith were walking to the house of Mrs Sullivan';
text.match(/(?:Mr |Mrs )?\w+/g);
// returns: [ 'Mr Potter', 'and', 'Mrs Smith', 'were', 'walking', 'to', 'the',
// 'house', 'of', 'Mrs Sullivan' ]
Upvotes: 1
Reputation: 89574
It's the difficulty with javascript regexes,
A way to avoid your specific problem:
/((?:Mrs?\.)|[^\.]+)+/ # match all that is not a dot or Mr. or Mrs.
For more tricks, you can take a look at this site: http://blog.stevenlevithan.com/archives/javascript-regex-lookbehind
Upvotes: 2