Reputation: 113425
When running the following small snippet, we get weird characters in the terminal:
const http = require("http")
http.get("http://www.pravda.com.ua/news/2017/10/6/7157464/", res => {
res.on("data", e => console.log(e.toString()))
})
...such as: ��������� ���
Why is that happening?
When doing curl http://www.pravda.com.ua/news/2017/10/6/7157464/
, we get raw question marks (such as: <title>? ? | ?? </title>
).
However, the browser seems to get good characters <title>У Кахов...</title>
.
Is it the server which sends different content or the way how it's interpreted by the client (Node.js vs curl vs browsers)?
Upvotes: 3
Views: 1404
Reputation: 8467
The website you request uses Windows-1251
encoding, which is not supported by NodeJS out of the box:
<meta http-equiv="Content-Type" content="text/html; charset=windows-1251" />
Browsers are smart enough to detect that and interpret correctly, apart from cURL and raw NodeJS requester. So, basically, you'll need a third-party module to convert the encoding, for example, iconv-lite
:
const http = require("http");
const iconv = require("iconv-lite");
http.get("http://www.pravda.com.ua/news/2017/10/6/7157464/", (res) => {
res.pipe(iconv.decodeStream("win1251")).collect((err, body) => {
if (err) throw err;
console.log(body);
})
});
In this snippet here I'm piping the response to the iconv-lite
Transform stream, which does all the dirty work.
Upvotes: 3