Ben Muircroft
Ben Muircroft

Reputation: 3034

I don't understand the string to byte aspect of computing/javascript

After two years coming back to this/another topic where I see people discussing the same; I still don't understand what is going on.

following this SO post:

String length in bytes in JavaScript

I want to understand this part of javascript! I am also interested in calculating the kb size of a bitcoin transaction before I push it to the blockchain. The more important of the two though is that I finally understand what these users are doing because its come up more than once and I just don't get it!

I've tried three of the functions outlined as answers but they all seem to do nothing more than return the string.length whereas I would expect them to return a different value (the overhead of the string in bytes/kilobytes/megabytes)

function byteCount(s) {
    return encodeURI(s).split(/%..|./).length - 1;
    }

console.log(byteCount('hello'),'hello'.length);//5,5


function getLengthInBytes(str) {
    var b = str.match(/[^\x00-\xff]/g);
    return (str.length + (!b ? 0: b.length)); 
    }

console.log(getLengthInBytes('hello'),'hello'.length);//5,5


console.log((new TextEncoder('utf-8').encode('hello')).length,'hello'.length);//5,5

It's annoying that this makes no sense to me! Clearly these people would not be talking about how to get something that they can easily get with string.length so what are they trying and succeeding in returning?

Should the string instead be binary? (like so: How to convert text to binary code in JavaScript?)

Upvotes: 0

Views: 55

Answers (2)

Jonas Wilms
Jonas Wilms

Reputation: 138457

There are a lot of different signs in the world. They dont fit in one byte of data. Thats why some chars use more than one byte of data. Some examples: "Äüöôś"

Upvotes: 1

JonSG
JonSG

Reputation: 13152

You are testing with the base ascii characters (well, they are utf8, but you can think of them a little like ascii and these characters work very similarly in both encodings). Try with an extended character.

console.log((new TextEncoder('utf-8').encode('😁')).length, '😁'.length);

Upvotes: 1

Related Questions