Reputation: 17
I am trying to learn puppeteer and I wanted to scrape the StockX web page at https://stockx.com/fr-fr/dior-b713-cactus-jack-mocha.
I wanted to scrape the title of the shoe at first such as "Dior B713 CACTUS JACK".
I tried to use : await document.querySelector to scrape the title but the page freezes and displays this error Error: Evaluation failed: TypeError: Cannot read properties of null (reading 'innerText'). Knowing that in the console the information is well returned.
I tried to use the exact CSS selector of the Google development tools, but without success. I have tried several combinations of CSS selectors, but I am still not able to extract the title I want.
const scraperObject = {
url: 'https://stockx.com/fr-fr/dior-b713-cactus-jack-mocha',
async scraper(browser){
let page = await browser.newPage();
console.log(`Navigating to ${this.url}...`);
await page.goto(this.url);
// Wait for the required DOM to be rendered
const result = await page.evaluate(() => {
let demandes = document.querySelector('#main-content > div > section:nth-child(3) > div.css-j7qwjs > div > h1').innerText;
return demandes;
})
console.log(demandes);
await browser
}
}
module.exports = scraperObject;
Upvotes: 0
Views: 2085
Reputation: 57204
There's no clear need to work with the DOM here at all, or Puppeteer for that matter, just to retrieve the product name, which is available in a few different places in the static HTML. Search view-source:https://stockx.com/fr-fr/dior-b713-cactus-jack-mocha
by "Dior B713 CACTUS JACK" to see all occurrences. The easiest to get appears to be the page title or twitter:title meta tag.
With Cheerio and Node 18 (for native fetch
; install node-fetch
if you're on an older Node):
const cheerio = require("cheerio"); // 1.0.0-rc.12
fetch("https://stockx.com/fr-fr/dior-b713-cactus-jack-mocha")
.then(res => res.text())
.then(html => {
const $ = cheerio.load(html);
console.log($('meta[property="twitter:title"]').attr("content"));
// or:
console.log($("title").text());
})
;
Puppeteer:
const puppeteer = require("puppeteer"); // ^15.4.0
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const ua = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36";
await page.setUserAgent(ua);
const url = "https://stockx.com/fr-fr/dior-b713-cactus-jack-mocha";
await page.goto(url, {waitUntil: "domcontentloaded"});
const title = await page.$eval(
'meta[property="twitter:title"]',
el => el.getAttribute("content")
);
console.log(title);
// or:
console.log(await page.title());
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Output:
Dior B713 CACTUS JACK Mocha
Dior B713 CACTUS JACK Mocha - 3SN281ZNV_H967StockX LogoStockX Logo
Upvotes: 0