minh
minh

Reputation: 11

Web scraping - h1 - font access issues

I am new to webscraping and have some issues access the HTML and CSS code.

I want to scrape the below website for the title: https://www.leaseplan.com/nl-nl/privatelease/onze-autos/4662/

Upon clicking on inspect, I found

<h1 class="u-margin-bottom-none u-margin-bottom@tablet" data-e2e-id="carName"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Fiat 500 - Lounge Hybrid </font></font></h1>

I want to take out: Fiat 500 - Lounge Hybrid and put that into an excel file.

I used:

async function getPageData(url, page) {
  await page.goto(url);

  const title = await page.$eval(
    "h1, id=carName",              
    (title) => title.textContent
  );

However, I keep getting error:

(node:6216) UnhandledPromiseRejectionWarning: Error: Evaluation failed: DOMException: Failed to execute 'querySelector' on 'Document': 'h1[class=u-margin-bottom-none u-margin-bottom@tablet] .carName' is not a valid selector.
    at __puppeteer_evaluation_script__:1:33

It has to do with "h1, id=carName", but I'm not sure how to write it in a way my scaper would recognize and take out Fiat 500 - Lounge Hybrid

Upvotes: 0

Views: 79

Answers (1)

vmank
vmank

Reputation: 784

The h1 doesn't have an id, it's a data attribute. Instead you can do this:

async function getPageData(url, page) {
  await page.goto(url);

    const title = await page.$eval(
      'h1[data-e2e-id="carName"]',              
      (title) => title.textContent
    );
}

There are various ways to select an element, take a look here for a quick reference.

Upvotes: 1

Related Questions