user2084028
user2084028

Reputation: 51

Way to implement better regex in Node.js

I'm using Node.js for a project, and I'm finding Javascript's regex syntax very limiting. Specifically the lack of lookbehind is killing me. I'm trying to use regex to parse strings into sentences, but I want to check for common abbreviations such as Mr. and Mrs. so that I don't break the sentences up. Is their a Node.js library that adds regex features, and if not what would a good course of action be?

Upvotes: 5

Views: 1851

Answers (2)

olleicua
olleicua

Reputation: 2118

The Node.js is based on the v8 engine and it's regex engine is the a part of v8. The v8 project is hosted here: https://code.google.com/p/v8/. The regex engine comes from this file: https://code.google.com/p/v8/source/browse/trunk/src/ia32/regexp-macro-assembler-ia32.cc?r=4966. You could in principal fork the project and add the desired features. I suspect this would be more effort than it is worth.

Regular expressions are generally not designed for parsing. There are many parsing libraries for Node.js that can be found here: https://npmjs.org/search?q=language+parsing. I can personally recommend hot-cocoa (https://github.com/olleicua/hot-cocoa) as I made it myself and it worked perfectly for my purposes.

Finally if your goal is just to match any single word or two words if the first one is 'Mr' or 'Mrs' then something like this might work:

var text = 'Mr Potter and Mrs Smith were walking to the house of Mrs Sullivan';
text.match(/(?:Mr |Mrs )?\w+/g);
// returns: [ 'Mr Potter', 'and', 'Mrs Smith', 'were', 'walking', 'to', 'the',
//            'house', 'of', 'Mrs Sullivan' ]

Upvotes: 1

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89574

It's the difficulty with javascript regexes,

A way to avoid your specific problem:

/((?:Mrs?\.)|[^\.]+)+/  # match all that is not a dot or Mr. or Mrs.

For more tricks, you can take a look at this site: http://blog.stevenlevithan.com/archives/javascript-regex-lookbehind

Upvotes: 2

Related Questions