How do I get an ASCII code from a string in JavaScript?

Question

(Similar questions to this have been asked on StackOverflow, but not exactly this. The nearest is probably "javascript how to convert unicode string to ascii", where there is already the remark "this has to be a dup[licate]". I have read some similar posts, but they don't answer my specific question. I've looked on the very good W3Schools site, and have also Googled it, but not found the answer that way either. So any hints here would be very much appreciated.)

I have an array of bytes being passed to a piece of JavaScript. In the JavaScript the data arrives in a string. I do not know the mechanism of transfer, as it's from a 3rd-party application. I do not know even whether the string is "wide" or "narrow".

In my JavaScript, I have some code like b = str.charCodeAt(pos);.

My problem is that a byte value such as 0x86 = 134 is coming through as character 0x2020 = 8224. This seems to be because my original byte interpreted as a Latin-1 (probably) 'dagger' character, and is then being translated to the equivalent Unicode code-point. (The problem may or may not be JavaScript's 'fault'.) Similar problems occur with other values, although the ranges 0x00..0x7F and 0xA0..0xFF seem to be fine, but most values from 0x80..0x9F are affected, in each case the value seems to be the Unicode for the original Latin-1.

Another observation is that the length of the string is what I'd expect for narrow string if the length was measured in bytes. (On the other hand, if length returns a value in abstract characters, this doesn't tell me anything.)

So, in JavaScript, is there a way at getting at the 'raw' bytes in a string, or getting a Latin-1 or ASCII character code directly, or of converting between character encodings, or defining the default encoding?

I could write my own mapping, but I'd rather not. I expect that is what I'll end up doing, but that has the feel of a kludge on a kludge.

I'm also looking into whether there's anything I can adjust in the calling application (as it could be passing the data as a wide string, although I doubt it).

Either way, though, I'd be interested in whether there is a simple JavaScript solution, or to understand why there isn't.

(If the incoming data was character data, having Unicode dealt with so automatically would be great. But it's not, it's just a binary data stream.)

Thanks.

Mike Samuel · Accepted Answer

There is no such thing as the raw bytes in a String. The EcmaScript spec defines a string as a sequence of UTF-16 code-units. That is the most fine-grained representation exposed by any interpreter have ever encountered.

On the browser there are no encoding libraries. You have to roll your own if you are trying to represent a byte array as a string and want to reencode it.

If your string already happens to be valid ASCII, then you can get the numeric value of a code unit by using the charCodeAt method.

"
".charCodeAt(0) === 10

How do I get an ASCII code from a string in JavaScript?

Answers (2)

Related Questions