List element attributes in Puppeteer

Context: I'm writing a tool that reads HTML and outputs a JSON object summarising elements that match a selector. The expected output of the tool is shown below:

example.html

<html>
  <body>
    <p class="test">first</p>
    <p id="second">second<p>
  </body>
</html>
cat example.html | mytool 'p'
[
  {"text": "first", "class": "test"},
  {"text": "second", "id": "second"}
]

I have implemented code to:

Problem

How do I detect the HTML attributes that are set using ElementHandle, and return them as an object? I only want to return attributes that are set; i.e has a name=value mapping written in the HTML.

The element HTML and expected function output is shown below:

<p class="test">first</p>
{ "class": "test", text: "first" }
<p id="second">second<p>
{"text": "second", "id": "second"}

Is this possible? I'd appreciate any help you could offer.

Upvotes: 0

Views: 440

Answers (1)

vsemozhebuty
vsemozhebuty

Reputation: 13782

You can try Element.attributes:

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch();

const html = `
  <!doctype html>
  <html>
    <body>
      <p id="first" class="test">first</p>
      <p id="second" class="test" hidden>second</p>
    </body>
  </html>`;

try {
  const [page] = await browser.pages();

  await page.goto(`data:text/html,${html}`);

  const data = await page.evaluate(
    (selector) => Array.from(
      document.querySelectorAll(selector),
      element => Object.assign(...Array.from(
        element.attributes,
        ({ name, value }) => (value !== '' ? { [name]: value } : {})
      ))
    ),
    'p'
  );
  console.log(data);
} catch(err) { console.error(err); } finally { await browser.close(); }

Upvotes: 2

Related Questions