Reputation: 1978
Context: I'm writing a tool that reads HTML and outputs a JSON object summarising elements that match a selector. The expected output of the tool is shown below:
example.html
<html>
<body>
<p class="test">first</p>
<p id="second">second<p>
</body>
</html>
cat example.html | mytool 'p'
[
{"text": "first", "class": "test"},
{"text": "second", "id": "second"}
]
I have implemented code to:
document.querySelectorAll
)Problem
How do I detect the HTML attributes that are set using ElementHandle
, and return them as an object? I only want to return attributes that are set; i.e has a name=value
mapping written in the HTML.
The element HTML and expected function output is shown below:
<p class="test">first</p>
{ "class": "test", text: "first" }
<p id="second">second<p>
{"text": "second", "id": "second"}
Is this possible? I'd appreciate any help you could offer.
Upvotes: 0
Views: 440
Reputation: 13782
You can try Element.attributes
:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch();
const html = `
<!doctype html>
<html>
<body>
<p id="first" class="test">first</p>
<p id="second" class="test" hidden>second</p>
</body>
</html>`;
try {
const [page] = await browser.pages();
await page.goto(`data:text/html,${html}`);
const data = await page.evaluate(
(selector) => Array.from(
document.querySelectorAll(selector),
element => Object.assign(...Array.from(
element.attributes,
({ name, value }) => (value !== '' ? { [name]: value } : {})
))
),
'p'
);
console.log(data);
} catch(err) { console.error(err); } finally { await browser.close(); }
Upvotes: 2