wivku
wivku

Reputation: 2653

puppeteer wait for page/DOM updates - respond to new items that are added after initial loading

I want to use Puppeteer to respond to page updates. The page shows items and when I leave the page open new items can appear over time. E.g. every 10 seconds a new item is added.

I can use the following to wait for an item on the initial load of the page:

await page.waitFor(".item");
console.log("the initial items have been loaded")

How can I wait for / catch future items? I would like to achieve something like this (pseudo code):

await page.goto('http://mysite');
await page.waitFor(".item");
// check items (=these initial items)

// event when receiving new items:
// check item(s) (= the additional [or all] items)

Upvotes: 7

Views: 9489

Answers (3)

pguardiario
pguardiario

Reputation: 54984

A simpler idea for waiting for text to change, you can use :last-child selector to wait for text of the last item to change:

await page.evaluate(sel => {
  let originalText = document.querySelector(sel).innerText
  return new Promise(resolve => {
    let interval = setInterval(() => {
      if(originalText !== document.querySelector(sel).innerText){
        clearInterval(interval)
        resolve()
      }
    }, 500)
  })
}, 'item:last-child')

  

Upvotes: 0

ggorlen
ggorlen

Reputation: 56925

As an alternative to the excellent current answer which injects a MutationObserver using evaluate which forwards the data to an exposed Node function, Puppeteer offers a higher-level function called page.waitForFunction that blocks on an arbitrary predicate and uses either a MutationObserver or requestAnimationFrame under the hood to determine when to re-evaluate the predicate.

Calling page.waitForFunction in a loop might add overhead since each new call involves registering a fresh observer or RAF. You'd have to profile for your use case. This isn't something I'd worry much about prematurely, though.

That said, the RAF option may provide tighter latency than MO for the cost of some extra CPU cycles to poll constantly.

Here's a minimal example on the following site that offers a periodically updating feed:

const wait = ms => new Promise(r => setTimeout(r, ms));
const r = (lo, hi) => ~~(Math.random() * (hi - lo) + lo);

const randomString = n =>
  [...Array(n)].map(() => String.fromCharCode(r(97, 123))).join("");

(async () => {
  for (let i = 0; i < 500; i++) {
    const el = document.createElement("div");
    document.body.appendChild(el);
    el.innerText = randomString(r(5, 15));
    await wait(r(1000, 5000));
  }
})();

const puppeteer = require("puppeteer");

const html = `<!DOCTYPE html>
<html><body><div class="container"></div><script>
const wait = ms => new Promise(r => setTimeout(r, ms));
const r = (lo, hi) => ~~(Math.random() * (hi - lo) + lo);
const randomString = n =>
  [...Array(n)].map(() => String.fromCharCode(r(97, 123))).join("")
;
(async () => {
  for (;;) {
    const el = document.createElement("div");
    document.querySelector(".container").appendChild(el);
    el.innerText = randomString(r(5, 15));
    await wait(r(1000, 5000));
  }
})();
</script></body></html>`;

let browser;
(async () => {
  browser = await puppeteer.launch({headless: false});
  const [page] = await browser.pages();
  await page.setContent(html);
  
  for (;;) {
    await page.waitForFunction((el, oldLength) =>
      el.children.length > oldLength,                           // predicate
      {polling: "mutation" /* or: "raf" */, timeout: 10**8},    // wFF options
      await page.$(".container"),                               // elem to watch
      await page.$eval(".container", el => el.children.length), // oldLength
    );
    const selMostRecent = ".container div:last-child";
    console.log(await page.$eval(selMostRecent, el => el.textContent));
  }
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

Note that this example is contrived; if multiple items are added to the feed at once, an item can be skipped. It'd be safer to grab all items beyond the oldLength. You'll almost certainly need to adjust this code to match your feed's specific behavior.

See also:

Upvotes: 3

hardkoded
hardkoded

Reputation: 21617

You can use exposeFunction to expose a local function:

await page.exposeFunction('getItem', function(a) {
    console.log(a);
});

Then you can use page.evaluate to create an observer and listen to new nodes created inside a parent node.

This example scrapes (it's just an idea, not a final work) the python chat in Stack Overflow, and prints new items being created in that chat.

var baseurl =  'https://chat.stackoverflow.com/rooms/6/python';
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto(baseurl);

await page.exposeFunction('getItem', function(a) {
    console.log(a);
});

await page.evaluate(() => {
    var observer = new MutationObserver((mutations) => { 
        for(var mutation of mutations) {
            if(mutation.addedNodes.length) {
                getItem(mutation.addedNodes[0].innerText);
            }
        }
    });
    observer.observe(document.getElementById("chat"), { attributes: false, childList: true, subtree: true });
});

Upvotes: 9

Related Questions