Reputation: 57
I'm using node-webkit to build an app that alerts me every time there is an alarm in my country (we are currently in a war). There is a website that supplies a JSON file that contains info about current alarms.
When I try to access that page and check whether there are alarms, the result is a lot of question marks. I can't use that, and when I try to JSON.parse the data it says that it cannot parse question marks. What do I do?
url: "http://www.oref.org.il/WarningMessages/alerts.json",
checkAlert: function(callback) {
request({
uri: this.url,
json: true,
encoding: 'utf-8'
}, function(err, res, json) {
if (err)
return console.log(err);
json = JSON.parse(json);
var data = json.data;
console.log('just checked. json.data: ' + data);
if (data.length != 0) // if array is not empty
callback(true);
else
callback(false);
});
}
Here's how the file looks like:
{
"id" : "1405751634717",
"title" : "something in hebrew ",
"data" : []
}
Thanks a lot!
Upvotes: 0
Views: 3155
Reputation: 7896
That API returns a JSON response encoded in UTF-16-LE
, so you'll have to tell request
to use that encoding instead.
However, since you're trying to query Pikud Haoref's alerts API, check out pikud-haoref-api
on npm to do the heavy lifting for you:
https://www.npmjs.com/package/pikud-haoref-api
(Disclaimer: I created this package)
Upvotes: 1
Reputation: 9084
Have a look here: jQuery doesn't display Hebrew
And be totally sure first that your JSON files are actually enconded in UTF-8
You might want to check how your server is serving those JSON and which codification they have.
Check also this link: http://dougal.gunters.org/blog/2012/03/14/dealing-with-utf-in-node-js/
Quick overview:
“V8 currently only accepts characters in the BMP as input, using UCS-2 as internal representation (the same representation as JavaScript strings).” Basically, this means that JavaScript uses the UCS-2 character encoding internally, which is strictly a 16-bit format, which in turn means that it can only support the first 65,536 code-points of Unicode characters. Any characters that fall outside that range are apparently truncated in the conversion from UTF-8 to UCS-2, mangling the character stream. In my case (as with many others I found in my research) this surfaces when the system attempts to serialize/deserialize these strings as JSON objects. In the conversion, you can end up with character sequences which are invalid UTF-8. When browsers see these broken strings come in, they promptly drop the connection mid-stream, apparently as a security measure. (I sort-of understand this, but would have a hard time explaining it, because these character-encoding discussions give me a headache).
Upvotes: 0