Reputation: 1626
Running on Puppeteer, all updated.
The intended process is to go to website, where url is url/{search item} and run through the list of search names. Then for each search item --> search page, get name, price and image url for each listing. Now theres error it cannot find selector. Appreciate any help on this, many thanks!
Layout of the data of the website is as follows:
<div class="items-box-content">
<section class="items-box">
<a href="https://listingurl">
<figure class="items-box-photo">
<img data-src="https://imageurl.jpg" class=" lazyloaded" src="https://imageurl.jpg">
</figure>
<div class="items-box-main">
<h3 class="items-box-name"> listing name </h3>
<div class="items-box-figure">
<div class="items-price font-4"> $29.95 </div> // item's price
</h3>
</div>
And what i have now is (which throws the error):
const puppeteer = require('puppeteer');
const searches = ["a", "b", "c"]; //appended to url
(async () => {
const browser = await puppeteer.launch({ headless: false });
let results =[];
for (const search of searches) {
try {
page = await browser.newPage();
await page.goto(`https://weburl/?keyword=${search}`);
await page.evaluate(() => { document.querySelector('div[class*="items-box"]').scrollIntoView();});
let elements = await page.$$('div[class*="items-box"]');
for (let element of elements) {
let listImg = await element.$eval(('img[class="items-box-photo]'), img => img.getAttribute('src'));
let listTitle = await element.$eval(('d[class="items-box-main"] > h[class="items-box-name"]'), node => node.innerText.trim());
let listPrice = await element.$eval(('d[class="items-box-figure"] > d[class="items-price"]'), node => node.innerText.trim());
let listUrl = await element.$eval(('d[class="items-box-content"] > a[class*="items-box"]'), node => node.getAttribute('href'));
results.push({
listImg,
listTitle,
listPrice,
listUrl
})
return results;
}
} finally {
await page.close
}
}
})();
The error thrown is
(node:5168) UnhandledPromiseRejectionWarning: Error: Error: failed to find element matching selector "img[class="items-box-photo]"
Upvotes: 2
Views: 12124
Reputation: 4444
I updated your code with my test/debug.
const puppeteer = require('puppeteer');
const searches = ["a"];
(async () => {
const browser = await puppeteer.launch({ headless: false });
function delay(timeout) {
return new Promise((resolve) => {
setTimeout(resolve, timeout);
});
}
let results = [];
for (const search of searches) {
try {
page = await browser.newPage();
await page.goto(`https:url/`);
await page.evaluate(() => { document.querySelector('section[class*="items-box"]').scrollIntoView(); });
let elements = await page.$$('section[class*="items-box"]');
console.log(elements.length)
console.log('wait 6 seconds')
await delay(6000);
for (let element of elements) {
// await delay(6000);
let listImg = await element.$eval(('img'), img => img.getAttribute('src'));
let listTitle = await element.$eval(('h3[class="items-box-name font-2"]'), node => node.innerText.trim());
let listPrice = await element.$eval(('div[class="items-box-price font-5"]'), node => node.innerText.trim());
let listUrl = await element.$eval(('div[class="items-box-content clearfix"] a'), node => node.getAttribute('href'));
results.push({
listImg,
listTitle,
listPrice,
listUrl
});
}
debugger;
} catch (error) {
console.log(error)
} finally {
//await page.close
await browser.close
}
}
console.log(results)
return results;
})();
Updated content:
1. return result
in for
loop
for(){
return result;
}
=>
for(){
}
return result;
querySelector
section[class*="items-box"]
img // There is only one img tags in "element"
h3[class="items-box-name font-2"] // removed outer 'element'
div[class="items-box-figure"] > div[class="items-price font-4"]
div[class="items-box-price font-5 // updated class name? on my side
items-box-price
div[class="items-box-content clearfix"] a
Updated sleep duration 6 Seconds, this is relative network speed(web load duration).
try
catch
finally
catch
help you to process next step although crash in one step.
Upvotes: 2
Reputation: 25260
The problem is right there in the error message (Error: failed to find element matching selector ...
).
The selectors are wrong in the following lines:
let listImg = await element.$eval(('img[class="items-box-photo]'), img => img.getAttribute('src'));
let listTitle = await element.$eval(('d[class="items-box-main"] > h[class="items-box-name"]'), node => node.innerText.trim());
let listPrice = await element.$eval(('d[class="items-box-figure"] > d[class="items-price"]'), node => node.innerText.trim());
let listUrl = await element.$eval(('d[class="items-box-content"] > a[class*="items-box"]'), node => node.getAttribute('href'));
According to the HTML code you have given, these should be:
let listImg = await element.$eval('img.lazyloaded', img => img.getAttribute('src'));
let listTitle = await element.$eval('h3.items-box-name', node => node.innerText.trim());
let listPrice = await element.$eval('div.items-price', node => node.innerText.trim());
let listUrl = await element.$eval('div.items-box-content a', node => node.getAttribute('href'));
Note, that instead of using [class=...]
the proper way to query a class is by using the class selector: .
Upvotes: 1