Kaan Taha Köken
Kaan Taha Köken

Reputation: 963

Parsing XML file in Node.js

I am using an Arch Linux system with KDE plasma. I have approximately 50mb XML, and I need to parse it. The file has custom tags.

Example XML:

<JMdict>
   <entry>
      <ent_seq>1000000</ent_seq>
      <r_ele>
         <reb>ヽ</reb>
      </r_ele>
      <sense>
         <pos>&unc;</pos>
         <gloss g_type="expl">repetition mark in katakana</gloss>
      </sense>
   </entry>
</JMdict>

I have tried many solutions that were suggested on Stack Overflow, and they did not work at all, and some of them could not installed to my system like xml-stream, xml2json. I decided to use xml2js (most of them suggest to use xml2js), and got the same result. How can I correctly use it ? I am using this code but it always returns undefined:

const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();

const path = "test.xml";

fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
     parser.parseString(data, function(err, res) {
         console.log(res);
     });
});

Result: Undefined

Is there any way to handle an XML file by hand (without a package)?

Upvotes: 5

Views: 16501

Answers (3)

R.G.Krish
R.G.Krish

Reputation: 497

This solution uses xml2js.

Working Example Link

var fs = require('fs'),
slash = require('slash'),
xml2js = require('xml2js');

var parser = new xml2js.Parser();

let filename = slash(__dirname+'/foo.xml');

// console.log(filename);

fs.readFile(filename,  "utf8", function(err, data) {

    if(err) {
        console.log('Err1111');
        console.log(err);
    } else {
        //console.log(data);
        // data.toString('ascii', 0, data.length)
        
        parser.parseString(data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&amp;'), function (err, result) {
            if(err) {
                console.log('Err');
                console.log(err);
            } else {
                console.log(JSON.stringify(result));
                console.log('Done');
            }            
        });
    }
});

Exact you have to do it below :

data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&')

Problem is below tag only &unc;

<pos>&unc;</pos>

Referenced And Thanks to @tim

Upvotes: 5

tamak
tamak

Reputation: 1591

I think your problem is unescaped characters in your xml data.

I'm able to get your example to work by using this:

xml data:

<JMdict>
    <entry>
        <ent_seq>1000000</ent_seq>
        <r_ele>
            <reb>ヽ</reb>
        </r_ele>
        <sense>
             <pos>YOUR PROBLEM WAS HERE</pos>
             <gloss g_type="expl">repetition mark in katakana</gloss>
        </sense>
    </entry>

node.js code:

const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();

const path = "test.xml";

fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
     parser.parseString(data, function(err, res) {
         console.log(JSON.stringify(res.JMdict.entry, null, 4));
     });

});

In situations like this, when I know it should work fine, I always look at the data and for any possible issues with the input data.

Upvotes: 3

Ray Chan
Ray Chan

Reputation: 1180

The way you use the xml2js package should be fine. However, the format of your xml is a little bit off.

if you add a console.log to see what's causing the error

fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
     parser.parseString(data, function(err, res) {
         if (err) console.log(err);

         console.log(res);
     });
});

You'll see that it's the line <pos>&unc;</pos> that causes the problem. If you fix the HTML entities, the parser should works fine.

Upvotes: 1

Related Questions