Admin UP Station
Admin UP Station

Reputation: 23

Ajax Request Chinese Character return '?'

I run a scrape site using nodejs to get the articles, I want to load Chinese website using XMLHttpRequest and the site is using this meta

<meta http-equiv="Content-Type" content="text/html; charset=gbk" />

and my site used charset UTF-8

xhr = new XMLHttpRequest();
xhr.open("GET", url, true);
xhr.setRequestHeader('Content-Type','text/html; charset=gbk');
xhr.onreadystatechange = function () {
//DOM Processing
    $=cheerio.load(xhr.responseText);
};
xhr.send();

does anyone know what i have to set for the header ? I tried charset gbk / GB2312 also didn't work. Any help will be great. Thanks

Upvotes: 2

Views: 447

Answers (1)

K.Sam
K.Sam

Reputation: 36

I think you are using: https://github.com/driverdan/node-XMLHttpRequest

In "Known Issues / Missing Features": Local file access may have unexpected results for non-UTF8 files

So I think this cannot be solved by node-XMLHttpRequest.

Here is my solution for scrape site using gbk, hope this is useful for you.

const rp = require('request-promise')
const cheerio = require('cheerio')
const iconv = require('iconv-lite')

const options = {
    url: `http://www.duchang.org/`,
    transform: function (body) {
        let html = iconv.decode(body, 'gbk')
        return cheerio.load(html)
    },
    encoding: null
}
rp(options)
    .then(($) => {
        // 首页头条
        console.log($)
    })
    .catch(function (err) {
        throw Error(err)
    })

Upvotes: 1

Related Questions