user3861247
user3861247

Reputation:

Replacing UTF-8 characters

I am working on an open jquery library jspdf.The above library does not support UTF-8 characters. Is there any way so that i can remove all the quotes UTF-8 character in my html string by using regex or any other method.

PSEDO CODE:

$(htmlstring).replace("utf-8 quotes character" , "") 

Upvotes: 9

Views: 18946

Answers (2)

First off: I urge you to stop using jsPDF if it doesn't support Unicode. It's mid 2014, and the lack of support should have meant the death of the project two years ago. But that's just my personal conviction and not part of the answer you're looking for.

If jsPDF only supports ANSI (a 255 character block, rather than ASCII's 127 character block), then you can simply do a regex replace for everything above \xFF:

"lolテスト".replace(/[\u0100-\uFFFF]/g,'');
// gives us "lol"

If you only want to get rid of quotation marks (but leave in potentially jsPDF breaking unicode), you can use the pattern for "just quotation marks" based on where they live in the unicode map:

string.replace(/[\u2018-\u201F\u275B-\u275E]/g, '')

will catch ['‘','’','‚','‛','“','”','„','‟','❛','❜','❝','❞'], although of course what you probably want to do is replace them with the corresponding safe character instead. Good news: just make a replacement array for the list just presented, and work with that.

2017 edit:

ES6 introduced a new pattern for unicode strings in the form of the \u{...} pattern, which can do "any number of hexdigits" inside the curly braces, so a full Unicode 9 compatible regexp would now be:

// we can't use these in a regexp directly, unfortunately
start = `\u{100}`;
end = `\u{10FFF0}`;
searchPattern = new RegExp(`[${start}-${end}]`,`g`);
c = `lolテスト`.replace(searchPattern, ``);

Upvotes: 11

Valerij
Valerij

Reputation: 27758

use

$(htmlstring).replace(/[^\x00-\x7F]/g,'')

to remove all non-ascii charakter

(via regex-any-ascii-character)

Upvotes: 3

Related Questions