Reputation: 1761
I would like to store the URLs included in tag from an XML file, in an array. I don't know how to start the and to extract links.
My NodeJS code
const fs = require("fs");
const xml2js = require('xml2js');
const util = require('util');
const parser = new xml2js.Parser();
fs.readFile('example.xml', (err, data) => {
parser.parseString(data, (err, result) => {
console.log((util.inspect(result, false, null)));
});
});
Inputs: XML File example
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://www.mywebsite.fr/001.html</loc>
<lastmod>2020-10-24</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://www.mywebsite.fr/002.html</loc>
<lastmod>2020-10-24</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://www.mywebsite.fr/003.html</loc>
<lastmod>2020-10-24</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
</urlset>
Expected Outputs
result =
[
'https://www.mywebsite.fr/001.html',
'https://www.mywebsite.fr/002.html',
'https://www.mywebsite.fr/003.html'
]
Upvotes: 2
Views: 422
Reputation: 24930
Using the sample xml in your question, try something like this:
urls = `[your xml above]`
xpath = require('xpath')
, dom = require('xmldom').DOMParser;
let target = new dom().parseFromString(urls);
item = xpath.select('//*[local-name()="loc"]/text()', target);
result = [];
item.forEach(function(url) {
result.push(url.nodeValue);
});
console.log(result);
Output:
[
'https://www.mywebsite.fr/001.html',
'https://www.mywebsite.fr/002.html',
'https://www.mywebsite.fr/003.html'
]
Upvotes: 1