Reputation: 704
I was doing this successfully with python and beautiful soup but now I am trying to port it into Node.js
My issues is that the loop is not putting each item into its own object in the array, but just putting everything into one like this:
[
{
numbers: '123987456789',
letters: 'ABCDEFG'
}
]
Instead of:
[
{
numbers: '123',
letters: 'A'
},
{
numbers: '987',
letters: 'B
}
]
The tricky part is that the div's have the same ID so I needed to get a particular one within the array of divs.
In Python I did this and then use append
to add the items to an empty list
myDivs = soup.select('#my-divs')[2]
numbers = myDivs('text.numbers')
labels = myDivs('text.labels')
Node.js
exports.scrapeData = async (req, res) => {
const html = await Axios.get(
"https://example.com"
);
const $ = await cheerio.load(html.data);
let tests = [];
$("#my-divs:eq(2)").each((i, elem) => {
tests.push({
numbers: $(elem).find("text.numbers").text(),
labels: $(elem).find("text.labels").text(),
});
});
console.log(tests);
html:
<div id="my-divs">
<text class="numbers">123</text>
<text class="labels">A</text>
<text class="numbers">987</text>
<text class="labels">B</text>
</div>
<div id="my-divs">
<text class="numbers">567</text>
<text class="labels">C</text>
<text class="numbers">543</text>
<text class="labels">D</text>
</div>
Upvotes: 3
Views: 7457
Reputation: 56935
Here's another approach using spreads and index pairing like a zip
function:
const cheerio = require("cheerio");
const html = `
<div id="my-divs">
<text class="numbers">123</text>
<text class="labels">A</text>
<text class="numbers">987</text>
<text class="labels">B</text>
</div>
<div id="my-divs">
<text class="numbers">567</text>
<text class="labels">C</text>
<text class="numbers">543</text>
<text class="labels">D</text>
</div>
`;
const $ = cheerio.load(html);
const numbers = [...$("#my-divs:eq(1) .labels")];
const data = [...$("#my-divs:eq(1) .numbers")].map((e, i) => ({
numbers: $(numbers[i]).text(),
letters: $(e).text(),
}));
console.log(data);
// => [ { numbers: 'C', letters: '567' }, { numbers: 'D', letters: '543' } ]
``
Upvotes: 0
Reputation: 338208
How about this approach: Find all .numbers
, and and for each of them, return an object containing its own .text()
and the immediately following .labels
using Cheerio's .next()
:
exports.scrapeData = async (req, res) => {
const html = await Axios.get("https://example.com");
const $ = await cheerio.load(html.data);
const tests = $("#my-divs:eq(2)").find(".numbers").map(function () {
return {
numbers: $(this).text().trim(),
labels: $(this).next(".labels").text().trim(),
}
}).toArray();
console.log(tests);
});
Cheerio's .map()
allows using this
contextually, and Cheerio's .toArray()
extracts the underlying JavaScript array from the Cheerio object.
Upvotes: 4