ryan
ryan

Reputation: 6655

Regex matching the boundary of a word that may contain punctuation

I am using regular expressions to manipulate a list of space-delimited strings. When a user wishes to delete a tag, the regex replaces the to-be-deleted tag with null and the system saves the new list. This ran into a snag when users started entering punctuation as part of the tag (valid use case). I started using regex escape when I realized punctuation was used:

RegExp.escape = function(s){
  return String(s).replace(/[\\^$*+?.()|[\]{}]/g, '\\$&');
};

When coupled with my existing regex format (\bTAGTODELETE\b), this doesn't match in certain scenarios.

Take the following example tag list: thisisatest? other test test2 test? test?ing

If I want to delete test? from the list, \btest\?\b matches test?ing. If I want to delete thisisatest?, \bthisisatesttest\?\b has 0 matches.

I've tried a few iterations but each seems to have its own problems.

Upvotes: 1

Views: 494

Answers (2)

Michael Laszlo
Michael Laszlo

Reputation: 12239

You may be able to work around your current problem with a new regex, but there are probably more headaches in store for you if you stick to this approach. There are better ways to manage tags than concatenating them into a string.

I recommend that you store each tag in an object and represent the tag list as an array of such objects.

Something like this:

var tags = [];
tags.push({ text: 'my new tag!', valid: true });
tags.push({ text: 'yeah, dude', valid: true });

If a user action causes the valid property to be set to false, you can scan the array and splice out the invalid tag.

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174706

Because there isn't a word boundary next to ?

\btest\?\B

This would match test? followed by sapce not test? follwed by ing in thisisatest? other test test2 test? test?ing input.

\b - matches between a word char and a non-word char (vice-versa)

\B - matches between two word char or two non-word char.

? non word character space non-word character. So \B is a perfect one for this case.

Upvotes: 1

Related Questions