Brandon McMullen
Brandon McMullen

Reputation: 67

Node Js & Puppeteer - How to select text wrapped inside an Anchor tag

I'm working on a project at the moment, have run into an error and need your help!

Basically, I am trying to select the wrapped text inside the following anchor tag

<a href="..." class="productDetailsLink js-productName">Product Name</a>

This is my current code:

 await page.waitForSelector('div > div > div > div > div > a[class = "productDetailsLink js-productName"')
        .then(() => page.evaluate(() => {
            const itemArray = [];
            const itemNodeList = document.querySelectorAll('div > div > div > div > div > a[class = "productDetailsLink js-productName"');
            

            itemNodeList.forEach(item => {
                const itemTitle = item.querySelectorAll('div > div > div > div > div > a[class = "productDetailsLink js-productName"').innerText;
                console.log(itemTitle);
            })
        } ))

However, I'm not getting any luck. I've run out of ideas on how to scrape such text.

Upvotes: 3

Views: 1223

Answers (3)

Genghiz Khan
Genghiz Khan

Reputation: 11

.innerText worked for me (not .text or .innerHTML)

Credit: saw it here: https://learnscraping.com/nodejs-web-scraping-with-puppeteer/

for the selector: choose to Inspect and Copy -> JS path.

below I copied the JS Path of the "Advanced help" link here:

document.querySelector("#mdhelp-tabs > li.float-right > a")

Yes, it comes with "document.querySelector" and all ready to paste in the puppeteer Node.js code

Upvotes: 0

ambianBeing
ambianBeing

Reputation: 3529

If those class attributes are unique to that particular anchor <a href="..." class="productDetailsLink js-productName">Product Name</a>, Following method could be used:

await page.evaluate(() => {
 let anchorText = document.querySelector('a.productDetailsLink.js-productName').innerHTML;
 console.info("anchorText::", anchorText);
});

/*OR another way*/
await page.$eval('a.productDetailsLink.js-productName', e => e.innerHTML);

If there are a list of anchors:

await page.evaluate(() => {
 let anchorList = document.querySelectorAll('a.productDetailsLink.js-productName');
 anchorList.forEach(e => {
  let anchorText = e.innerHTML;
  console.info("anchorText::", anchorText);
 });
});

Upvotes: 1

VPaul
VPaul

Reputation: 1013

Not sure how Puppeteer works but I've had great success using cheerio (https://www.npmjs.com/package/cheerio) for parsing scraped html with phantom.

I think you can use puppeteer like phatom for scraping and use cheerio on the scraped HTML content like this below:

const cheerio = require('cherio');
const $ = cheerio.load(content); // content is your HTML scraped
result = $('. productDetailsLink').text();

Upvotes: 1

Related Questions