Reputation: 602
I am using https://www.npmjs.com/package/bad-words and i created regex for filter special characters.
const Filter = require('bad-words');
const badWordsFilter = new Filter({replaceRegex: /[A-Za-z0-9öÖÇ窺ĞğİıÜü_]/g});
badWordsFilter.addWords(['badword', 'şğ'])
If word doesn't contain turkish character, it works. But if i write turkish character like ş or ğ it is not filtering.
Is my regex wrong?
I found this code in documentation:
var filter = new Filter({ regex: /\*|\.|$/gi });
var filter = new Filter({ replaceRegex: /[A-Za-z0-9가-힣_]/g });
//multilingual support for word filtering
Upvotes: 4
Views: 2015
Reputation: 5556
You obviously have an encoding problem since your regex works out of your app, see here: https://regex101.com/r/VpItfH/3/.
So I think encoding your characters in your regex in your app may help:
See the encoded regex result here: https://regex101.com/r/VpItfH/4/
More details
Trying the following encoded regex in a PCRE regex engine will work (https://regex101.com/r/VpItfH/5):
/[A-Za-z0-9\x{f6}\x{d6}\x{c7}\x{e7}\x{15e}\x{15f}\x{11e}\x{11f}\x{130}\x{131}\x{dc}\x{fc}_]/g
but when selecting a javascript regex engine the {
,}
will break the unicode so you need to remove them and if the character is not recognized then replace \x
with \u0
. E.g. \x{15e}
becomes \u015e
Then you can do the same match as when you use /[A-Za-z0-9öÖÇ窺ĞğİıÜü_]/g
.
Note: to get the unicode form of a character, you can do
"Ğ".charCodeAt(0).toString(16);
and prefix it with\x
or\u0
.
Hope this can help, and at least acknowledge that you can encode characters inside a regex and still match the same. :)
Upvotes: 2
Reputation: 11050
You need to make that regular expression Unicode-aware by adding the u
flag to it. More precisely, change /[A-Za-z0-9öÖÇ窺ĞğİıÜü_]/g
into /[A-Za-z0-9öÖÇ窺ĞğİıÜü_]/gu
(added a u
at the end). This will work only in modern browsers (basically, all but Internet Explorer) though. There are other options as well, that you may want to consider if you want to support older browsers.
Upvotes: 1
Reputation: 18684
Can you please try with:
var filter = new Filter({ replaceRegex: /(\w+)/gi });
For sure you have to use replaceRegex
option.
The pattern matches everything case insentively.
Here's what /(\w+)/gi
does descriptively (thanks to regex101):
Upvotes: 1
Reputation: 1055
Encode your javascript file into utf-8 and update your meta tag to:
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
Hoping this will help you.
Upvotes: 0