Wesley Skeen
Wesley Skeen

Reputation: 8285

JavaScript to replace Chinese characters

I am building a JavaScript array depending on the input of the user. The array is building fine but if the user enters Chinese symbols it crashes. I'm assuming that it is if the user enters a chinese " or a , or a '. I have the program replacing the English versions of this but i don't know how to replace the Chinese versions of it.

Can anyone help?

Thanks to all for their input

Upvotes: 4

Views: 2760

Answers (5)

Evandro Coan
Evandro Coan

Reputation: 9418

Not asked by the question, but adding \u30a0-\u30ff\u3040-\u309f you can also take out the Hiragana and Katakana from Japanese:

replace(/[\u4e00-\u9fff\u3400-\u4dff\uf900-\ufaff\u30a0-\u30ff\u3040-\u309f]/g, '')
  1. https://regex101.com/r/4Aw9Q8/1
  2. https://en.wikipedia.org/wiki/Katakana_(Unicode_block)
  3. https://en.wikipedia.org/wiki/Hiragana_(Unicode_block)

Upvotes: 0

tsroten
tsroten

Reputation: 2764

Building on broofa's answer:

If you just want to find and replace the Chinese punctuation like " or " or a . then you'll want to use unicode characters in the range of FF00-FFEF. Here is a PDF from Unicode showing them: http://unicode.org/charts/PDF/UFF00.pdf
I think you'd want at least replace these: FF01, FF02, FF07, FF0C, FF0E, FF1F, and FF61. That should be the major Chinese punctuation marks. You can use broofa's replace function.

Upvotes: 1

broofa
broofa

Reputation: 38112

From What's the complete range for Chinese characters in Unicode?, the CJK unicode ranges are:

  • 4E00-9FFF (common)
  • 3400-4DFF (rare)
  • F900-FAFF (compatability - Duplicates, unifiable variants, corporate characters)
  • 20000-2A6DF (rare, historic)
  • 2F800-2FA1F (compatability - supplement)

Because JS strings only support UCS-2, which max out at FFFF, the last two ranges probably aren't of great interest. Thus, if you're building a JS string should be able to filter out chinese characters using something like:

replace(/[\u4e00-\u9fff\u3400-\u4dff\uf900-\ufaff]/g, '')

Upvotes: 4

RoToRa
RoToRa

Reputation: 38400

.Net provides JavaScriptSerializer and it's method Serialize, which creates correctly escaped JavaScript literals (although I personally haven't used it with Chinese characters, but there is no reason it shouldn't work).

Upvotes: 1

Sergey
Sergey

Reputation: 8071

You need to use unicode replacer. I think it will help you: http://answers.yahoo.com/question/index?qid=20080528045141AAJ0AIS

Upvotes: 2

Related Questions