Reputation: 1543
I grabbed the following JavaScript regular expression replace from another site to strip out some invalid characters:
str = str.replace(/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/g,'');
However, I noticed it wasn't catching occurrences of \00B7 (the ISO-8859-1 center dot character).
If I did it in two steps however, it works:
str = str.replace(/\u00B7/g,'');
str = str.replace(/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/g,'');
The 1st replace seems to be included in the 2nd replace. Can somebody explain to me why the 2nd line doesn't work all by itself. Thanks.
Upvotes: 0
Views: 1095
Reputation: 147453
Just to be clear:
/[^\u000D\u00B7\u0020-\u007E\u00A2-\u00A4]/
matches all characters not in the set. So to match \u00B7 (and have it replaced with ''), remove it from the pattern:
/[^\u000D\u0020-\u007E\u00A2-\u00A4]/
The ASCII character set is given at http://www.asciitable.com/, likely that is the set you want to keep. The range \u0020-\u007E covers most the common set that is of interest, the others are typically not wanted.
\u000D is a carriage return, I would investigate whether you really need u00A2, u00A3 and u00A4.
Upvotes: 0
Reputation: 1667
The first and second pattern are completely different. Pattern one replaces \u00B7, while the second pattern replaces all characters NOT listed in the pattern. Remove the carat from pattern two and that should fix your issue.
Upvotes: 2