Jake
Jake

Reputation: 1185

Convert Unicode to UTF8

I am trying to mashup two different 3rd party services in javascript and I am getting strings in a certain character set, that I need to convert to a different character set in Javascript.

For example, the string is tést.

I am given an encoded string like this: te%u0301st. The accent is encoded as %u0301. I need to somehow convert this to this string: t%C3%A9st where the é is encoded as %C3%A9. How can I convert e%u0301 to %C3%A9 in javascript?

Thanks

Upvotes: 1

Views: 6145

Answers (2)

ecmanaut
ecmanaut

Reputation: 5150

If all you need is any URL-escaped Unicode encoding, this will do the trick:

function convert(s) {
  function parse(a, c) {
    return String.fromCharCode(parseInt(c, 16));
  }
  return encodeURIComponent(s.replace(/%u([0-f]{4})/gi, parse));
}

convert('te%u0301st'); // => te%CC%81st

If you specifically need Normal Form C, you need to implement a whole lot of Unicode intelligence yourself, as 'te\u0301st'.length (or 'tést'.length) is 5 in javascript.

Upvotes: 0

Brian Campbell
Brian Campbell

Reputation: 332736

You appear to be trying to normalize your input, probably in Unicode Normal Form C. I do not know of any simple way to do this in Javascript; you may need to implement the normalization algorithm yourself, or find a library which does so.

edited to remove answer to the wrong question

Upvotes: 2

Related Questions