David
David

Reputation: 553

Change JavaScript string encoding

At the moment I have a large JavaScript string I'm attempting to write to a file, but in a different encoding (ISO-8859-1). I was hoping to use something like downloadify. Downloadify only accepts normal JavaScript strings or base64 encoded strings.

Because of this, I've decided to compress my string using JSZip which generates a nicely base64 encoded string that can be passed to downloadify, and downloaded to my desktop. Huzzah! The issue is that the string I compressed, of course, is still the wrong encoding.

Luckily JSZip can take a Uint8Array as data, instead of a string. So is there any way to convert a JavaScript string into a ISO-8859-1 encoded string and store it in a Uint8Array?

Alternatively, if I'm approaching this all wrong, is there a better solution all together? Is there a fancy JavaScript string class that can use different internal encodings?

Edit: To clarify, I'm not pushing this string to a webpage so it won't automatically convert it for me. I'm doing something like this:

var zip = new JSZip();
zip.file("genSave.txt", result);

return zip.generate({compression:"DEFLATE"});

And for this to make sense, I would need result to be in the proper encoding (and JSZip only takes strings, arraybuffers, or uint8arrays).

Final Edit (This was -not- a duplicate question because the result wasn't being displayed in the browser or transmitted to a server where the encoding could be changed):

This turned out to be a little more obscure than I had thought, so I ended up rolling my own solution. It's not nearly as robust as a proper solution would be, but it'll convert a JavaScript string into windows-1252 encoding, and stick it in a Uint8Array:

var enc = new string_transcoder("windows-1252");
var tenc = enc.transcode(result); //This is now a Uint8Array

You can then either use it in the array like I did:

//Make this into a zip
var zip = new JSZip();   
zip.file("genSave.txt", tenc);   
return zip.generate({compression:"DEFLATE"});

Or convert it into a windows-1252 encoded string using this string encoding library:

var string = TextDecoder("windows-1252").decode(tenc);

To use this function, either use:

<script src="//www.eu4editor.com/string_transcoder.js"></script>

Or include this:

function string_transcoder (target) {

    this.encodeList = encodings[target];
    if (this.encodeList === undefined) {
        return undefined;
    }

    //Initialize the easy encodings
    if (target === "windows-1252") {
        var i;
        for (i = 0x0; i <= 0x7F; i++) {
            this.encodeList[i] = i;          
        }
        for (i = 0xA0; i <= 0xFF; i++) {
            this.encodeList[i] = i;
        }
    }

}

string_transcoder.prototype.transcode = function (inString) {


    var res = new Uint8Array(inString.length), i;


    for (i = 0; i < inString.length; i++) {
        var temp = inString.charCodeAt(i);
        var tempEncode = (this.encodeList)[temp];
        if (tempEncode === undefined) {
            return undefined; //This encoding is messed up
        } else {
            res[i] = tempEncode;
        }
    }

    return res;
};

encodings = {

    "windows-1252": {0x20AC:0x80, 0x201A:0x82, 0x0192:0x83, 0x201E:0x84, 0x2026:0x85, 0x2020:0x86, 0x2021:0x87, 0x02C6:0x88, 0x2030:0x89, 0x0160:0x8A, 0x2039:0x8B, 0x0152:0x8C, 0x017D:0x8E, 0x2018:0x91, 0x2019:0x92, 0x201C:0x93, 0x201D:0x94, 0x2022:0x95, 0x2013:0x96, 0x2014:0x97, 0x02DC:0x98, 0x2122:0x99, 0x0161:0x9A, 0x203A:0x9B, 0x0153:0x9C, 0x017E:0x9E, 0x0178:0x9F}     

};

Upvotes: 23

Views: 33115

Answers (3)

Andreas Covidiot
Andreas Covidiot

Reputation: 4745

The best solution for me was posted here and this is my one-liner:

<!-- Required for non-UTF encodings (quite big) -->
<script src="encoding-indexes.js"></script>

<script src="encoding.js"></script>
...
// windows-1252 is just one typical example encoding/transcoding
let transcodedString = new TextDecoder( 'windows-1252' ).decode( 
                         new TextEncoder().encode( someUtf8String ))

or this if the transcoding has to be applied on multiple inputs reusing the encoder and decoder:

let srcArr = [ ... ]  // some UTF-8 string array
let encoder = new TextEncoder()
let decoder = new TextDecoder( 'windows-1252' )
let transcodedArr = srcArr.forEach( (s,i) => { 
                      srcArr[i] = decoder.decode( encoder.encode( s )) })

(The slightly modified other answer from related question:)

This is what I found after a more specific Google search than just UTF-8 encode/decode. so for those who are looking for a converting library to convert between encodings, here you go.

github.com/inexorabletash/text-encoding

var uint8array = new TextEncoder().encode(str);
var str = new TextDecoder(encoding).decode(uint8array);

Paste from repo readme

All encodings from the Encoding specification are supported:

utf-8 ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6 
iso-8859-7 iso-8859-8 iso-8859-8-i iso-8859-10 iso-8859-13 iso-8859-14 
iso-8859-15 iso-8859-16 koi8-r koi8-u macintosh windows-874 windows-1250 
windows-1251 windows-1252 windows-1253 windows-1254 windows-1255 
windows-1256 windows-1257 windows-1258 x-mac-cyrillic gb18030 hz-gb-2312 
big5 euc-jp iso-2022-jp shift_jis euc-kr replacement utf-16be utf-16le 
x-user-defined

(Some encodings may be supported under other names, e.g. ascii, iso-8859-1, etc. See Encoding for additional labels for each encoding.)

Upvotes: 0

user2511140
user2511140

Reputation: 1688

Test the following script:

<script type="text/javascript" charset="utf-8">

Upvotes: 1

Nate
Nate

Reputation: 13242

This turned out to be a little more obscure than [the author] had thought, so [the author] ended up rolling [his] own solution. It's not nearly as robust as a proper solution would be, but it'll convert a JavaScript string into windows-1252 encoding, and stick it in a Uint8Array:

var enc = new string_transcoder("windows-1252");
var tenc = enc.transcode(result); //This is now a Uint8Array

You can then either use it in the array like [the author] did:

//Make this into a zip
var zip = new JSZip();   
zip.file("genSave.txt", tenc);   
return zip.generate({compression:"DEFLATE"});

Or convert it into a windows-1252 encoded string using this string encoding library:

var string = TextDecoder("windows-1252").decode(tenc);

To use this function, either use:

<script src="//www.eu4editor.com/string_transcoder.js"></script>

Or include this:

function string_transcoder (target) {

    this.encodeList = encodings[target];
    if (this.encodeList === undefined) {
        return undefined;
    }

    //Initialize the easy encodings
    if (target === "windows-1252") {
        var i;
        for (i = 0x0; i <= 0x7F; i++) {
            this.encodeList[i] = i;          
        }
        for (i = 0xA0; i <= 0xFF; i++) {
            this.encodeList[i] = i;
        }
    }

}

string_transcoder.prototype.transcode = function (inString) {


    var res = new Uint8Array(inString.length), i;


    for (i = 0; i < inString.length; i++) {
        var temp = inString.charCodeAt(i);
        var tempEncode = (this.encodeList)[temp];
        if (tempEncode === undefined) {
            return undefined; //This encoding is messed up
        } else {
            res[i] = tempEncode;
        }
    }

    return res;
};

encodings = {

    "windows-1252": {0x20AC:0x80, 0x201A:0x82, 0x0192:0x83, 0x201E:0x84, 0x2026:0x85, 0x2020:0x86, 0x2021:0x87, 0x02C6:0x88, 0x2030:0x89, 0x0160:0x8A, 0x2039:0x8B, 0x0152:0x8C, 0x017D:0x8E, 0x2018:0x91, 0x2019:0x92, 0x201C:0x93, 0x201D:0x94, 0x2022:0x95, 0x2013:0x96, 0x2014:0x97, 0x02DC:0x98, 0x2122:0x99, 0x0161:0x9A, 0x203A:0x9B, 0x0153:0x9C, 0x017E:0x9E, 0x0178:0x9F}     

};

Upvotes: 8

Related Questions