Reputation: 963
I am using an Arch Linux system with KDE plasma. I have approximately 50mb XML, and I need to parse it. The file has custom tags.
Example XML:
<JMdict>
<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<sense>
<pos>&unc;</pos>
<gloss g_type="expl">repetition mark in katakana</gloss>
</sense>
</entry>
</JMdict>
I have tried many solutions that were suggested on Stack Overflow, and they did not work at all, and some of them could not installed to my system like xml-stream
, xml2json
. I decided to use xml2js
(most of them suggest to use xml2js
), and got the same result. How can I correctly use it ?
I am using this code but it always returns undefined:
const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();
const path = "test.xml";
fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
parser.parseString(data, function(err, res) {
console.log(res);
});
});
Result: Undefined
Is there any way to handle an XML file by hand (without a package)?
Upvotes: 5
Views: 16501
Reputation: 497
This solution uses xml2js.
var fs = require('fs'),
slash = require('slash'),
xml2js = require('xml2js');
var parser = new xml2js.Parser();
let filename = slash(__dirname+'/foo.xml');
// console.log(filename);
fs.readFile(filename, "utf8", function(err, data) {
if(err) {
console.log('Err1111');
console.log(err);
} else {
//console.log(data);
// data.toString('ascii', 0, data.length)
parser.parseString(data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&'), function (err, result) {
if(err) {
console.log('Err');
console.log(err);
} else {
console.log(JSON.stringify(result));
console.log('Done');
}
});
}
});
Exact you have to do it below :
data.replace(/&(?!(?:apos|quot|[gl]t|amp);|#)/g, '&')
Problem is below tag only &unc;
<pos>&unc;</pos>
Upvotes: 5
Reputation: 1591
I think your problem is unescaped characters in your xml data.
I'm able to get your example to work by using this:
xml data:
<JMdict>
<entry>
<ent_seq>1000000</ent_seq>
<r_ele>
<reb>ヽ</reb>
</r_ele>
<sense>
<pos>YOUR PROBLEM WAS HERE</pos>
<gloss g_type="expl">repetition mark in katakana</gloss>
</sense>
</entry>
node.js code:
const fs = require('fs-extra');
const xml2js = require('xml2js');
const parser = new xml2js.Parser();
const path = "test.xml";
fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
parser.parseString(data, function(err, res) {
console.log(JSON.stringify(res.JMdict.entry, null, 4));
});
});
In situations like this, when I know it should work fine, I always look at the data and for any possible issues with the input data.
Upvotes: 3
Reputation: 1180
The way you use the xml2js package should be fine. However, the format of your xml is a little bit off.
if you add a console.log
to see what's causing the error
fs.readFile(path, {encoding: 'utf-8'}, function(error, data) {
parser.parseString(data, function(err, res) {
if (err) console.log(err);
console.log(res);
});
});
You'll see that it's the line <pos>&unc;</pos>
that causes the problem.
If you fix the HTML entities, the parser should works fine.
Upvotes: 1