Reputation: 15434
Normally for generating url slug I use https://github.com/jprichardson/string.js library - and exactly slugify
method. However it removes all chinese characters. As a workaround I use following function:
var slugify = function(str){
str = str.replace(/\s+/g,'-') // replace spaces with dashes
str = encodeURIComponent(str) // encode (it encodes chinese characters)
return str
}
So for input 中文 标题
I get %E4%B8%AD%E6%96%87-%E6%A0%87%E9%A2%98
and it looks like this in web browser url input box (and it works):
http://example.com/中文-标题
However I want to also remove any special characters like !@#$%^&*)
etc. The problem is that string.js
library is using following piece of code internally:
.replace(/[^\w\s-]/g
And it removes any special characters, BUT ALSO removes chinese characters as they don't match with \w
regexp...
So my question is - how to modify above regexp so make it keep chinese characters?
I tried
replace(/[^a-zA-Z0-9_\s-\u3400-\u9FBF]/g,'')
But it still replaces chinese characters...
Upvotes: 4
Views: 4032
Reputation: 404
You can try uslug, which slugify 汉语/漢語
to 汉语漢語
If you want to transform Chinese characters to Pinyin, try transliteration
Upvotes: 1
Reputation: 4339
If you want to match (or exclude) the dash -
character in a set of characters (with square brackets), you have to put it in the end.
Your regexp matches characters that are not
a-z
A-Z
0-9
_
\s-\u3400
that's your problem-
\u9FBF
You want to do:
replace(/[^a-zA-Z0-9_\u3400-\u9FBF\s-]/g,'')
Upvotes: 3
Reputation: 11106
do a positive match list:
replace(/[\!@#\$%^&\*\)]/g,'')
Anyway I would consider to take URL meta chars out of that:
replace(/[\!@\$\^\*\)]/g,'')
Upvotes: 0