Nick White
Nick White

Reputation: 1612

Read XML hosted file with NodeJS

Ok so I have attempted to use multiple XML libraries that NodeJS have to offer and I can't seem to work out how to have an NodeJS read the XML file from a website.

I can pull the file using http.request, http.get and all of that but then to have NodeJS be able to actually do anything with the data in the XML file is another story.

I'm sure I must be missing something as when ever I turn the XML to JS with xml-stream; it can not use it from a website; my code runs when I host the file however I am using an api and they only use XML.

Current code:

    var http = require('http');
var XmlStream = require('xml-stream');
var options = { host: 'cloud.tfl.gov.uk',
        path: '/TrackerNet/LineStatus'};
var twitter = { host: 'api.twitter.com',
        path: '/1/statuses/user_timeline.rss?screen_name=nwhite89'}


var request = http.get(options).on('response', function(response) {

  response.setEncoding('utf8');
  var xml = new XmlStream(response);

  xml.on('updateElement: item', function(item) {

    item.title = item.title.match(/^[^:]+/)[0] + ' on ' +
      item.pubDate.replace(/ +[0-9]{4}/, '');
  });


  xml.on('text: item > pubDate', function(element) {

    element.$text = element.$text;

  }); 


  xml.on('data', function(data) {
    process.stdout.write(data);
  });
});

What I don't understand is using Twitter works fine outputs at xml.on("data") part however using options (cloud.tfl.gov.uk) nothing outputs even if I put console.log("hi") inside the data function it dosn't get executed.

I know that the url is correct outputting console.log(xml) or console.log(response) after creating the variable xml outputs that it has connected. Any help would be greatly appreciated with this I have been stuck on this for a good 2 days now.

Upvotes: 2

Views: 6454

Answers (1)

loganfsmyth
loganfsmyth

Reputation: 161447

There is a byte order mark before the <?xml tag, which xml-stream trips up on a bit and stops it from being able to read the encoding in the tag. That means you need to provide it yourself.

Instead of this:

response.setEncoding('utf8');
var xml = new XmlStream(response);

Just do this:

response.setEncoding('utf8');
var xml = new XmlStream(response, 'utf8');

And really, setting the encoding on the stream is optional.

var xml = new XmlStream(response, 'utf8');

works just fine.

More info here: http://en.wikipedia.org/wiki/Byte_order_mark#UTF-8

If you look at the buffer emitted from response rather that xml, the buffer starts with

<Buffer ef bb bf 3c 3f 78 6d ...>

The first 3 bytes are the byte order mark for utf8, and afterwards you have the start of the tag. xml-stream expects the <?xml tag to only have whitespace between it and the start of the file, but byte order marks don't count as whitespace.

Upvotes: 6

Related Questions