Waterfall
Waterfall

Reputation: 704

how to loop over results using cheerio

I was doing this successfully with python and beautiful soup but now I am trying to port it into Node.js

My issues is that the loop is not putting each item into its own object in the array, but just putting everything into one like this:

[
  {
    numbers: '123987456789',
    letters: 'ABCDEFG'
  }
]

Instead of:

[
  {
    numbers: '123',
    letters: 'A'
  },
  {
    numbers: '987',
    letters: 'B
  }
]

The tricky part is that the div's have the same ID so I needed to get a particular one within the array of divs.

In Python I did this and then use append to add the items to an empty list

myDivs = soup.select('#my-divs')[2]
numbers = myDivs('text.numbers')
labels = myDivs('text.labels')

Node.js

exports.scrapeData = async (req, res) => {
  const html = await Axios.get(
    "https://example.com"
  );
  const $ = await cheerio.load(html.data);
  let tests = [];
  $("#my-divs:eq(2)").each((i, elem) => {
    tests.push({
      numbers: $(elem).find("text.numbers").text(),
      labels: $(elem).find("text.labels").text(),
    });
  });

  console.log(tests);

html:

  <div id="my-divs">
      <text class="numbers">123</text>
      <text class="labels">A</text>
      <text class="numbers">987</text>
      <text class="labels">B</text>
  </div>
  <div id="my-divs">
      <text class="numbers">567</text>
      <text class="labels">C</text>
      <text class="numbers">543</text>
      <text class="labels">D</text>
  </div>

Upvotes: 3

Views: 7457

Answers (2)

ggorlen
ggorlen

Reputation: 56935

Here's another approach using spreads and index pairing like a zip function:

const cheerio = require("cheerio");

const html = `
<div id="my-divs">
  <text class="numbers">123</text>
  <text class="labels">A</text>
  <text class="numbers">987</text>
  <text class="labels">B</text>
</div>
<div id="my-divs">
  <text class="numbers">567</text>
  <text class="labels">C</text>
  <text class="numbers">543</text>
  <text class="labels">D</text>
</div>
`;
const $ = cheerio.load(html);
const numbers = [...$("#my-divs:eq(1) .labels")];
const data = [...$("#my-divs:eq(1) .numbers")].map((e, i) => ({
  numbers: $(numbers[i]).text(),
  letters: $(e).text(),
}));
console.log(data);
// => [ { numbers: 'C', letters: '567' }, { numbers: 'D', letters: '543' } ]
``

Upvotes: 0

Tomalak
Tomalak

Reputation: 338208

How about this approach: Find all .numbers, and and for each of them, return an object containing its own .text() and the immediately following .labels using Cheerio's .next():

exports.scrapeData = async (req, res) => {
  const html = await Axios.get("https://example.com");
  const $ = await cheerio.load(html.data);

  const tests = $("#my-divs:eq(2)").find(".numbers").map(function () {
    return {
      numbers: $(this).text().trim(),
      labels: $(this).next(".labels").text().trim(),
    }
  }).toArray();

  console.log(tests);
});

Cheerio's .map() allows using this contextually, and Cheerio's .toArray() extracts the underlying JavaScript array from the Cheerio object.

Upvotes: 4

Related Questions