Reputation: 11
I'm trying to develop a little crawling/scraping project with Crawlee and Playwright on JavaScript/TypeScript, For each URL I feed the crawler it tries to scrap some data like this:
productDescriptionContainer = await page.locator(
'div[class="product-details__product-description"]'
),
region = await productDescriptionContainer
.locator("p")
.filter({ hasText: "Región:" })
.textContent(),
farm = await productDescriptionContainer
.locator("p")
.filter({ hasText: "Finca:" })
.textContent(),
The problem comes when one of the locator is not found on the page. The crawler retries 3 times and completely stops the scraping process for that specific URL. I would like to set those variables to some default value if the locator is not found and continue with the next.
I hope you can shed some light onto this because I've run out of ideas (catching the error, using ||, initialise the variables...). Thank you in advance.
Upvotes: 1
Views: 22170
Reputation: 917
just remove browser.close();
or set timout to 0 where you want to wait endless
const response = await page.waitForResponse('**/api/posts', { timeout: 0 });
Upvotes: 0
Reputation: 101
Here is the solution I found to this.
Include a .catch in the await call, to avoid throw an error, so the code continues.
On your sample, it should be like this
region = await productDescriptionContainer
.locator("p")
.filter({ hasText: "Región:" })
.textContent(),
.catch((e) => console.log(e))
farm = await productDescriptionContainer
.locator("p")
.filter({ hasText: "Finca:" })
.textContent(),
.catch((e) => console.log(e))
Upvotes: 4