Reputation: 67
I'm working on a project at the moment, have run into an error and need your help!
Basically, I am trying to select the wrapped text inside the following anchor tag
<a href="..." class="productDetailsLink js-productName">Product Name</a>
This is my current code:
await page.waitForSelector('div > div > div > div > div > a[class = "productDetailsLink js-productName"')
.then(() => page.evaluate(() => {
const itemArray = [];
const itemNodeList = document.querySelectorAll('div > div > div > div > div > a[class = "productDetailsLink js-productName"');
itemNodeList.forEach(item => {
const itemTitle = item.querySelectorAll('div > div > div > div > div > a[class = "productDetailsLink js-productName"').innerText;
console.log(itemTitle);
})
} ))
However, I'm not getting any luck. I've run out of ideas on how to scrape such text.
Upvotes: 3
Views: 1223
Reputation: 11
.innerText worked for me (not .text or .innerHTML)
Credit: saw it here: https://learnscraping.com/nodejs-web-scraping-with-puppeteer/
for the selector: choose to Inspect and Copy -> JS path.
below I copied the JS Path of the "Advanced help" link here:
document.querySelector("#mdhelp-tabs > li.float-right > a")
Yes, it comes with "document.querySelector" and all ready to paste in the puppeteer Node.js code
Upvotes: 0
Reputation: 3529
If those class attributes are unique to that particular anchor <a href="..." class="productDetailsLink js-productName">Product Name</a>
, Following method could be used:
await page.evaluate(() => {
let anchorText = document.querySelector('a.productDetailsLink.js-productName').innerHTML;
console.info("anchorText::", anchorText);
});
/*OR another way*/
await page.$eval('a.productDetailsLink.js-productName', e => e.innerHTML);
If there are a list of anchors:
await page.evaluate(() => {
let anchorList = document.querySelectorAll('a.productDetailsLink.js-productName');
anchorList.forEach(e => {
let anchorText = e.innerHTML;
console.info("anchorText::", anchorText);
});
});
Upvotes: 1
Reputation: 1013
Not sure how Puppeteer works but I've had great success using cheerio
(https://www.npmjs.com/package/cheerio) for parsing scraped html with phantom
.
I think you can use puppeteer like phatom for scraping and use cheerio on the scraped HTML content like this below:
const cheerio = require('cherio');
const $ = cheerio.load(content); // content is your HTML scraped
result = $('. productDetailsLink').text();
Upvotes: 1