batman191
batman191

Reputation: 57

Getting question marks when trying to get data from API

I'm using node-webkit to build an app that alerts me every time there is an alarm in my country (we are currently in a war). There is a website that supplies a JSON file that contains info about current alarms.
When I try to access that page and check whether there are alarms, the result is a lot of question marks. I can't use that, and when I try to JSON.parse the data it says that it cannot parse question marks. What do I do?

url: "http://www.oref.org.il/WarningMessages/alerts.json",
checkAlert: function(callback) {
    request({ 
        uri: this.url,
        json: true,
        encoding: 'utf-8'

        }, function(err, res, json) { 
            if (err)
                return console.log(err);

            json = JSON.parse(json);
            var data = json.data;

            console.log('just checked. json.data: ' + data);

            if (data.length != 0) // if array is not empty
                callback(true);
            else
                callback(false);
    });
}

Here's how the file looks like:

{ 
"id" : "1405751634717",
"title" : "something in hebrew ",
"data" : []
}

Thanks a lot!

Upvotes: 0

Views: 3155

Answers (2)

Elad Nava
Elad Nava

Reputation: 7896

That API returns a JSON response encoded in UTF-16-LE, so you'll have to tell request to use that encoding instead.

However, since you're trying to query Pikud Haoref's alerts API, check out pikud-haoref-api on npm to do the heavy lifting for you:

https://www.npmjs.com/package/pikud-haoref-api

(Disclaimer: I created this package)

Upvotes: 1

avcajaraville
avcajaraville

Reputation: 9084

Have a look here: jQuery doesn't display Hebrew

And be totally sure first that your JSON files are actually enconded in UTF-8

You might want to check how your server is serving those JSON and which codification they have.

Check also this link: http://dougal.gunters.org/blog/2012/03/14/dealing-with-utf-in-node-js/

Quick overview:

“V8 currently only accepts characters in the BMP as input, using UCS-2 as internal representation (the same representation as JavaScript strings).” Basically, this means that JavaScript uses the UCS-2 character encoding internally, which is strictly a 16-bit format, which in turn means that it can only support the first 65,536 code-points of Unicode characters. Any characters that fall outside that range are apparently truncated in the conversion from UTF-8 to UCS-2, mangling the character stream. In my case (as with many others I found in my research) this surfaces when the system attempts to serialize/deserialize these strings as JSON objects. In the conversion, you can end up with character sequences which are invalid UTF-8. When browsers see these broken strings come in, they promptly drop the connection mid-stream, apparently as a security measure. (I sort-of understand this, but would have a hard time explaining it, because these character-encoding discussions give me a headache).

Upvotes: 0

Related Questions