Reputation: 437
This is in the nodejs repl loop.
> let re = new RegExp('[:punct:]*lipsticks[:punct:]*', 'i');
/[:punct:]*lipsticks[:punct:]*/i
> 'LipsticksGuava'.replace(re, '')
'Guava'
> 'LipsticksNaked'.replace(re, '')
'aked'
What happened to the N?
Revised my experiment based on feedback.
> re = new RegExp('[:punct:]*lipsticks[:punct:]*', 'i');
/[:punct:]*lipsticks[:punct:]*/i
> 'LipsticksNaked'.replace(re, '')
'aked'
> re = new RegExp('[[:punct:]]*lipsticks[[:punct:]]*', 'i');
/[[:punct:]]*lipsticks[[:punct:]]*/i
> 'LipsticksNaked'.replace(re, '')
'LipsticksNaked'
>
Upvotes: 0
Views: 610
Reputation: 627199
The JavaScript flavor does not support [:POSIX CHARACTER CLASS:]
s.
What is going on?
The /[:punct:]*lipsticks[:punct:]/gi
regex matches
[:punct:]*
- (an NFA character class) zero or more (due to *
) characters from the set: :
, p
, u
, n
, c
or t
in a case insensitive way (it matches an empty space before LipsticksNaked
in your case)lipsticks
- a literal string lipsticks
, case insensitively[:punct:]*
- (see explanation above) this part matches N
since the letter is on the list inside the character class.What does it happen if we try to use a POSIX character class in a bracket expression in JS as [[:punct:]]
?
This [[:punct:]]
pattern is actually a sequence of 2 subpatterns:
[[:punct:]
- a character class matching [
, :
, p
, u
, n
, c
, t
characters]*
- zero or more closing square bracketsThus, this whole `` pattern successfully matches :LipsticksN
in :LipsticksNaked
.
Any solution?
To match punctuation you may use XRegExp
\p{P}
:
var str = ".;-LipstickNaked";
regex = XRegExp('\\p{P}*lipstick\\p{P}*', 'ig');
var replaced = XRegExp.replace(str, regex, "");
console.log(replaced);
// or if you cannot use XRegExp
var pP_block = "(?:[\\x21-\\x23\\x25-\\x2A\\x2C-\\x2F\\x3A\\x3B\\x3F\\x40\\x5B-\\x5D\\x5F\\x7B\\x7D\\xA1\\xA7\\xAB\\xB6\\xB7\\xBB\\xBF\\u037E\\u0387\\u055A-\\u055F\\u0589\\u058A\\u05BE\\u05C0\\u05C3\\u05C6\\u05F3\\u05F4\\u0609\\u060A\\u060C\\u060D\\u061B\\u061E\\u061F\\u066A-\\u066D\\u06D4\\u0700-\\u070D\\u07F7-\\u07F9\\u0830-\\u083E\\u085E\\u0964\\u0965\\u0970\\u0AF0\\u0DF4\\u0E4F\\u0E5A\\u0E5B\\u0F04-\\u0F12\\u0F14\\u0F3A-\\u0F3D\\u0F85\\u0FD0-\\u0FD4\\u0FD9\\u0FDA\\u104A-\\u104F\\u10FB\\u1360-\\u1368\\u1400\\u166D\\u166E\\u169B\\u169C\\u16EB-\\u16ED\\u1735\\u1736\\u17D4-\\u17D6\\u17D8-\\u17DA\\u1800-\\u180A\\u1944\\u1945\\u1A1E\\u1A1F\\u1AA0-\\u1AA6\\u1AA8-\\u1AAD\\u1B5A-\\u1B60\\u1BFC-\\u1BFF\\u1C3B-\\u1C3F\\u1C7E\\u1C7F\\u1CC0-\\u1CC7\\u1CD3\\u2010-\\u2027\\u2030-\\u2043\\u2045-\\u2051\\u2053-\\u205E\\u207D\\u207E\\u208D\\u208E\\u2308-\\u230B\\u2329\\u232A\\u2768-\\u2775\\u27C5\\u27C6\\u27E6-\\u27EF\\u2983-\\u2998\\u29D8-\\u29DB\\u29FC\\u29FD\\u2CF9-\\u2CFC\\u2CFE\\u2CFF\\u2D70\\u2E00-\\u2E2E\\u2E30-\\u2E42\\u3001-\\u3003\\u3008-\\u3011\\u3014-\\u301F\\u3030\\u303D\\u30A0\\u30FB\\uA4FE\\uA4FF\\uA60D-\\uA60F\\uA673\\uA67E\\uA6F2-\\uA6F7\\uA874-\\uA877\\uA8CE\\uA8CF\\uA8F8-\\uA8FA\\uA8FC\\uA92E\\uA92F\\uA95F\\uA9C1-\\uA9CD\\uA9DE\\uA9DF\\uAA5C-\\uAA5F\\uAADE\\uAADF\\uAAF0\\uAAF1\\uABEB\\uFD3E\\uFD3F\\uFE10-\\uFE19\\uFE30-\\uFE52\\uFE54-\\uFE61\\uFE63\\uFE68\\uFE6A\\uFE6B\\uFF01-\\uFF03\\uFF05-\\uFF0A\\uFF0C-\\uFF0F\\uFF1A\\uFF1B\\uFF1F\\uFF20\\uFF3B-\\uFF3D\\uFF3F\\uFF5B\\uFF5D\\uFF5F-\\uFF65]|\\uD802[\\uDC57\\uDD1F\\uDD3F\\uDE50-\\uDE58\\uDE7F\\uDEF0-\\uDEF6\\uDF39-\\uDF3F\\uDF99-\\uDF9C]|\\uD809[\\uDC70-\\uDC74]|\\uD805[\\uDCC6\\uDDC1-\\uDDD7\\uDE41-\\uDE43\\uDF3C-\\uDF3E]|\\uD836[\\uDE87-\\uDE8B]|\\uD801\\uDD6F|\\uD82F\\uDC9F|\\uD804[\\uDC47-\\uDC4D\\uDCBB\\uDCBC\\uDCBE-\\uDCC1\\uDD40-\\uDD43\\uDD74\\uDD75\\uDDC5-\\uDDC9\\uDDCD\\uDDDB\\uDDDD-\\uDDDF\\uDE38-\\uDE3D\\uDEA9]|\\uD800[\\uDD00-\\uDD02\\uDF9F\\uDFD0]|\\uD81A[\\uDE6E\\uDE6F\\uDEF5\\uDF37-\\uDF3B\\uDF44])*";
var re2 = RegExp(pP_block + "lipstick" + pP_block, "gi");
console.log(str.replace(re2, ""));
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-all-min.js"></script>
You need to use a POSIX character class [:punct:]
in a bracket expression as [[:punct:]]
, otherwise [:punct:]
works as a bracket expression and matches either a colon, or p
, or u
, or n
(that is why it is removed since the case insensitive matching is enabled with the i
modifier), or c
or t
.
Upvotes: 1