user2171796
user2171796

Reputation: 437

Character class [:punct:] doesn't seem to work correctly

This is in the nodejs repl loop.

> let re = new RegExp('[:punct:]*lipsticks[:punct:]*', 'i');
/[:punct:]*lipsticks[:punct:]*/i
> 'LipsticksGuava'.replace(re, '')
'Guava'
> 'LipsticksNaked'.replace(re, '')
'aked'

What happened to the N?


Revised my experiment based on feedback.

> re = new RegExp('[:punct:]*lipsticks[:punct:]*', 'i');
/[:punct:]*lipsticks[:punct:]*/i
> 'LipsticksNaked'.replace(re, '')
'aked'
> re = new RegExp('[[:punct:]]*lipsticks[[:punct:]]*', 'i');
/[[:punct:]]*lipsticks[[:punct:]]*/i
> 'LipsticksNaked'.replace(re, '')
'LipsticksNaked'
> 

Upvotes: 0

Views: 610

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627199

JS flavor

The JavaScript flavor does not support [:POSIX CHARACTER CLASS:]s.

What is going on?

The /[:punct:]*lipsticks[:punct:]/gi regex matches

  • [:punct:]* - (an NFA character class) zero or more (due to *) characters from the set: :, p, u, n, c or t in a case insensitive way (it matches an empty space before LipsticksNaked in your case)
  • lipsticks - a literal string lipsticks, case insensitively
  • [:punct:]* - (see explanation above) this part matches N since the letter is on the list inside the character class.

What does it happen if we try to use a POSIX character class in a bracket expression in JS as [[:punct:]]?

This [[:punct:]]pattern is actually a sequence of 2 subpatterns:

  • [[:punct:] - a character class matching [, :, p, u, n, c, t characters
  • ]* - zero or more closing square brackets

Thus, this whole `` pattern successfully matches :LipsticksN in :LipsticksNaked.

Any solution?

To match punctuation you may use XRegExp \p{P}:

var str = ".;-LipstickNaked";
regex = XRegExp('\\p{P}*lipstick\\p{P}*', 'ig');  
var replaced = XRegExp.replace(str, regex, "");
console.log(replaced);
// or if you cannot use XRegExp
var pP_block = "(?:[\\x21-\\x23\\x25-\\x2A\\x2C-\\x2F\\x3A\\x3B\\x3F\\x40\\x5B-\\x5D\\x5F\\x7B\\x7D\\xA1\\xA7\\xAB\\xB6\\xB7\\xBB\\xBF\\u037E\\u0387\\u055A-\\u055F\\u0589\\u058A\\u05BE\\u05C0\\u05C3\\u05C6\\u05F3\\u05F4\\u0609\\u060A\\u060C\\u060D\\u061B\\u061E\\u061F\\u066A-\\u066D\\u06D4\\u0700-\\u070D\\u07F7-\\u07F9\\u0830-\\u083E\\u085E\\u0964\\u0965\\u0970\\u0AF0\\u0DF4\\u0E4F\\u0E5A\\u0E5B\\u0F04-\\u0F12\\u0F14\\u0F3A-\\u0F3D\\u0F85\\u0FD0-\\u0FD4\\u0FD9\\u0FDA\\u104A-\\u104F\\u10FB\\u1360-\\u1368\\u1400\\u166D\\u166E\\u169B\\u169C\\u16EB-\\u16ED\\u1735\\u1736\\u17D4-\\u17D6\\u17D8-\\u17DA\\u1800-\\u180A\\u1944\\u1945\\u1A1E\\u1A1F\\u1AA0-\\u1AA6\\u1AA8-\\u1AAD\\u1B5A-\\u1B60\\u1BFC-\\u1BFF\\u1C3B-\\u1C3F\\u1C7E\\u1C7F\\u1CC0-\\u1CC7\\u1CD3\\u2010-\\u2027\\u2030-\\u2043\\u2045-\\u2051\\u2053-\\u205E\\u207D\\u207E\\u208D\\u208E\\u2308-\\u230B\\u2329\\u232A\\u2768-\\u2775\\u27C5\\u27C6\\u27E6-\\u27EF\\u2983-\\u2998\\u29D8-\\u29DB\\u29FC\\u29FD\\u2CF9-\\u2CFC\\u2CFE\\u2CFF\\u2D70\\u2E00-\\u2E2E\\u2E30-\\u2E42\\u3001-\\u3003\\u3008-\\u3011\\u3014-\\u301F\\u3030\\u303D\\u30A0\\u30FB\\uA4FE\\uA4FF\\uA60D-\\uA60F\\uA673\\uA67E\\uA6F2-\\uA6F7\\uA874-\\uA877\\uA8CE\\uA8CF\\uA8F8-\\uA8FA\\uA8FC\\uA92E\\uA92F\\uA95F\\uA9C1-\\uA9CD\\uA9DE\\uA9DF\\uAA5C-\\uAA5F\\uAADE\\uAADF\\uAAF0\\uAAF1\\uABEB\\uFD3E\\uFD3F\\uFE10-\\uFE19\\uFE30-\\uFE52\\uFE54-\\uFE61\\uFE63\\uFE68\\uFE6A\\uFE6B\\uFF01-\\uFF03\\uFF05-\\uFF0A\\uFF0C-\\uFF0F\\uFF1A\\uFF1B\\uFF1F\\uFF20\\uFF3B-\\uFF3D\\uFF3F\\uFF5B\\uFF5D\\uFF5F-\\uFF65]|\\uD802[\\uDC57\\uDD1F\\uDD3F\\uDE50-\\uDE58\\uDE7F\\uDEF0-\\uDEF6\\uDF39-\\uDF3F\\uDF99-\\uDF9C]|\\uD809[\\uDC70-\\uDC74]|\\uD805[\\uDCC6\\uDDC1-\\uDDD7\\uDE41-\\uDE43\\uDF3C-\\uDF3E]|\\uD836[\\uDE87-\\uDE8B]|\\uD801\\uDD6F|\\uD82F\\uDC9F|\\uD804[\\uDC47-\\uDC4D\\uDCBB\\uDCBC\\uDCBE-\\uDCC1\\uDD40-\\uDD43\\uDD74\\uDD75\\uDDC5-\\uDDC9\\uDDCD\\uDDDB\\uDDDD-\\uDDDF\\uDE38-\\uDE3D\\uDEA9]|\\uD800[\\uDD00-\\uDD02\\uDF9F\\uDFD0]|\\uD81A[\\uDE6E\\uDE6F\\uDEF5\\uDF37-\\uDF3B\\uDF44])*";
var re2 = RegExp(pP_block + "lipstick" + pP_block, "gi");
console.log(str.replace(re2, ""));
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/2.0.0/xregexp-all-min.js"></script>

POSIX flavor

You need to use a POSIX character class [:punct:] in a bracket expression as [[:punct:]], otherwise [:punct:] works as a bracket expression and matches either a colon, or p, or u, or n (that is why it is removed since the case insensitive matching is enabled with the i modifier), or c or t.

Upvotes: 1

Related Questions