Reputation: 11
I am new to webscraping and have some issues access the HTML and CSS code.
I want to scrape the below website for the title: https://www.leaseplan.com/nl-nl/privatelease/onze-autos/4662/
Upon clicking on inspect, I found
<h1 class="u-margin-bottom-none u-margin-bottom@tablet" data-e2e-id="carName"><font style="vertical-align: inherit;"><font style="vertical-align: inherit;">Fiat 500 - Lounge Hybrid </font></font></h1>
I want to take out: Fiat 500 - Lounge Hybrid and put that into an excel file.
I used:
async function getPageData(url, page) {
await page.goto(url);
const title = await page.$eval(
"h1, id=carName",
(title) => title.textContent
);
However, I keep getting error:
(node:6216) UnhandledPromiseRejectionWarning: Error: Evaluation failed: DOMException: Failed to execute 'querySelector' on 'Document': 'h1[class=u-margin-bottom-none u-margin-bottom@tablet] .carName' is not a valid selector.
at __puppeteer_evaluation_script__:1:33
It has to do with "h1, id=carName", but I'm not sure how to write it in a way my scaper would recognize and take out Fiat 500 - Lounge Hybrid
Upvotes: 0
Views: 79
Reputation: 784
The h1 doesn't have an id
, it's a data attribute. Instead you can do this:
async function getPageData(url, page) {
await page.goto(url);
const title = await page.$eval(
'h1[data-e2e-id="carName"]',
(title) => title.textContent
);
}
There are various ways to select an element, take a look here for a quick reference.
Upvotes: 1