Estratachuela
Estratachuela

Reputation: 11

Get the text content from a locator (Playwright and Crawlee) and default to an specific value if that locator is not found

I'm trying to develop a little crawling/scraping project with Crawlee and Playwright on JavaScript/TypeScript, For each URL I feed the crawler it tries to scrap some data like this:

productDescriptionContainer = await page.locator(
      'div[class="product-details__product-description"]'
    ),
    region = await productDescriptionContainer
      .locator("p")
      .filter({ hasText: "Región:" })
      .textContent(),
    farm = await productDescriptionContainer
      .locator("p")
      .filter({ hasText: "Finca:" })
      .textContent(),

The problem comes when one of the locator is not found on the page. The crawler retries 3 times and completely stops the scraping process for that specific URL. I would like to set those variables to some default value if the locator is not found and continue with the next.

I hope you can shed some light onto this because I've run out of ideas (catching the error, using ||, initialise the variables...). Thank you in advance.

Upvotes: 1

Views: 22170

Answers (2)

Hyzyr
Hyzyr

Reputation: 917

just remove browser.close();

or set timout to 0 where you want to wait endless

const response = await page.waitForResponse('**/api/posts', { timeout: 0 });

Upvotes: 0

Here is the solution I found to this.

Include a .catch in the await call, to avoid throw an error, so the code continues.

On your sample, it should be like this

region = await productDescriptionContainer
  .locator("p")
  .filter({ hasText: "Región:" })
  .textContent(),
  .catch((e) => console.log(e))
farm = await productDescriptionContainer
  .locator("p")
  .filter({ hasText: "Finca:" })
  .textContent(),
  .catch((e) => console.log(e))

Upvotes: 4

Related Questions