Paroo
Paroo

Reputation: 221

Puppeteer- Need help to extract the text from h2 and span

Absolute beginner here with JS. I need help to extract the text from DOM which looks like this. Extracting can be done by querySelectorAll() or getElementsByTagName(). But what I'm looking for is to create an object with each h2 element as the key and the span as it's value. I don't have an idea of how this can be achieved. Any suggestions would be very helpful.

<div class ="product-list">
  <div class="row  column">
    <div class="column medium-9 large-10">
         <h2 class="product-name">Products List 1</h2>
    </div>
  </div>
  <div class="row">
    <span>First Product</span>
  </div>
  <div class="row">
   <span> Second Product</span>
  </div>
  .
  .
  .
  <div class="row">
    <span>
    Nth Product
    </span>
  </div>
  <div class="row  column">
    <div class="column medium-9 large-10">
         <h2 class="product-name">Products List 2</h2>
    </div>
  </div>
  <div class="row">
    <span>Thrid Product</span>
  </div>
  <div class="row">
   <span> Fourth Product</span>
  </div>
  .
  .
  .
  <div class="row">
    <span>
    Nth Product
    </span>
  </div>
</div>

From this DOM I need to store the data as

[
Products List 1 :[First Product,Second Product...Nth Product],
Products List 2 :[Third Product,Fourth Product...Nth Product]
]

JS:

const products=await page.evaluate(()=>{
      const productsArray=[];
      
      var pdName1=document.querySelectorAll('div.column > h2.product-name');

      var pdName2=document.querySelectorAll("div.row > span")
      pdName2.forEach(query=>{
        productArray.push(query.innerText)
    })

      return productArray
  })
 

Upvotes: 2

Views: 539

Answers (1)

vsemozhebuty
vsemozhebuty

Reputation: 13812

You can try something like this:

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch();

const html = `
  <!doctype html>
  <html>
    <head><meta charset='UTF-8'><title>Test</title></head>
    <body>
      <div class ="product-list">
        <div class="row  column">
          <div class="column medium-9 large-10">
               <h2 class="product-name">Products List 1</h2>
          </div>
        </div>
        <div class="row"><span>First Product</span></div>
        <div class="row"><span> Second Product</span></div>
        <div class="row"><span>Nth Product</span></div>
        <div class="row  column">
          <div class="column medium-9 large-10">
               <h2 class="product-name">Products List 2</h2>
          </div>
        </div>
        <div class="row"><span>Thrid Product</span></div>
        <div class="row"><span> Fourth Product</span></div>
        <div class="row"><span>Nth Product</span></div>
      </div>
    </body>
  </html>`;

try {
  const [page] = await browser.pages();

  await page.goto(`data:text/html,${html}`);

  const data = await page.evaluate(() => {
    const elements = document.querySelectorAll('h2, div.row span');
    const list = {};
    let currentKey = null;

    for (const element of elements) {
      if (element.tagName === 'H2') {
        currentKey = element.innerText;
        list[currentKey] = [];
      } else {
        list[currentKey].push(element.innerText);
      }
    }

    return list;
  });
  console.log(data);
} catch (err) { console.error(err); } finally { await browser.close(); }

Upvotes: 1

Related Questions