Ionică Bizău
Ionică Bizău

Reputation: 113425

Decode special cyrillic characters

When running the following small snippet, we get weird characters in the terminal:

const http = require("http")
http.get("http://www.pravda.com.ua/news/2017/10/6/7157464/", res => {
    res.on("data", e => console.log(e.toString()))
})

...such as: ��������� ���

Why is that happening? When doing curl http://www.pravda.com.ua/news/2017/10/6/7157464/, we get raw question marks (such as: <title>? ? | ?? </title>).

However, the browser seems to get good characters <title>У Кахов...</title>.

Is it the server which sends different content or the way how it's interpreted by the client (Node.js vs curl vs browsers)?

Upvotes: 3

Views: 1404

Answers (1)

Damaged Organic
Damaged Organic

Reputation: 8467

The website you request uses Windows-1251 encoding, which is not supported by NodeJS out of the box:

<meta http-equiv="Content-Type" content="text/html; charset=windows-1251" />

Browsers are smart enough to detect that and interpret correctly, apart from cURL and raw NodeJS requester. So, basically, you'll need a third-party module to convert the encoding, for example, iconv-lite:

const http = require("http");
const iconv = require("iconv-lite");

http.get("http://www.pravda.com.ua/news/2017/10/6/7157464/", (res) => {
  res.pipe(iconv.decodeStream("win1251")).collect((err, body) => {
    if (err) throw err;

    console.log(body);
  })
});

In this snippet here I'm piping the response to the iconv-lite Transform stream, which does all the dirty work.

Upvotes: 3

Related Questions