user606521
user606521

Reputation: 15434

How to generate url slug from chinese characters?

Normally for generating url slug I use https://github.com/jprichardson/string.js library - and exactly slugify method. However it removes all chinese characters. As a workaround I use following function:

var slugify = function(str){
   str = str.replace(/\s+/g,'-') // replace spaces with dashes
   str = encodeURIComponent(str) // encode (it encodes chinese characters)
   return str
}

So for input 中文 标题 I get %E4%B8%AD%E6%96%87-%E6%A0%87%E9%A2%98 and it looks like this in web browser url input box (and it works):

http://example.com/中文-标题

However I want to also remove any special characters like !@#$%^&*) etc. The problem is that string.js library is using following piece of code internally:

.replace(/[^\w\s-]/g

And it removes any special characters, BUT ALSO removes chinese characters as they don't match with \w regexp...

So my question is - how to modify above regexp so make it keep chinese characters?


I tried

replace(/[^a-zA-Z0-9_\s-\u3400-\u9FBF]/g,'')

But it still replaces chinese characters...

Upvotes: 4

Views: 4032

Answers (3)

Equal
Equal

Reputation: 404

You can try uslug, which slugify 汉语/漢語 to 汉语漢語

If you want to transform Chinese characters to Pinyin, try transliteration

Upvotes: 1

Volune
Volune

Reputation: 4339

If you want to match (or exclude) the dash - character in a set of characters (with square brackets), you have to put it in the end.

Your regexp matches characters that are not

  • in the range a-z
  • in the range A-Z
  • in the range 0-9
  • _
  • in the range \s-\u3400 that's your problem
  • -
  • \u9FBF

You want to do:

replace(/[^a-zA-Z0-9_\u3400-\u9FBF\s-]/g,'')

Upvotes: 3

Axel Amthor
Axel Amthor

Reputation: 11106

do a positive match list:

  replace(/[\!@#\$%^&\*\)]/g,'')

Anyway I would consider to take URL meta chars out of that:

   replace(/[\!@\$\^\*\)]/g,'')

Upvotes: 0

Related Questions