ditoslav
ditoslav

Reputation: 4892

Node js buffer.toString() encoding problems

Im parsing a website which is using Windows-1250 charset and for the last 3 days I can't get my page to show the data in the same encoding. My guess is that the problem is somewhere in getting the data from the buffer or to the buffer. I tried installing the IConv module but there was a whole new set of problems so I was wondering whether there was a way to fix this without using iconv.

Basicly, Im getting "ANDRIJAŠEVCI" from the website and after the code below i get "ANDRIJA?EVCI"

    var options2 = {
        host: 'vred.hzinfra.hr',
        path: '/hzinfo/default.asp?Category=hzinfo&Service=vred3',
        headers: {"Accept-Charset": "Windows-1250,utf-8;ISO-8859-3,utf-8;ISO-8859-2,utf-8", "Content-Type": "text/html; charset=ISO-8859-2" }
    }

    var request2 = http.request(options2, function (res){
        var data = new Buffer(0,'utf-8');
        res.on('data', function (chunk) {
            data = Buffer.concat([data,chunk]);
        });
        res.on('end', function () {
            console.log(data.toString('utf-8'));
        });
    });
    request2.end();

Upvotes: 2

Views: 7076

Answers (1)

Golo Roden
Golo Roden

Reputation: 150882

There are several problems in your code.

  1. It's utf8, not utf-8 in Node.js, hence it can not work.
  2. The website returns Windows-1250, but you deal with it as utf-8. This can not work, too.
  3. Node.js does not support Windows-1250 encoding, so this won't work using pure Node.js, no matter what you do (except you are going to convert the raw bytes, but I wouldn't recommend that for obvious reasons).

So, to cut a long story short: What you want is (hardly) possible without an additional library. Basically, you already found the way to go (iconv), but you wrote that there were some additional problems. As you did not say what these problems were, I can only give you the quite generic advise that your code should look somewhat like this:

converter = new iconv.Iconv('windows-1250', 'utf8');
data = converter.convert(data).toString();

Hope this helps…

Upvotes: 2

Related Questions