Alex
Alex

Reputation: 2780

Scrape Text From Iframe

How would I scrape text from an iframe with puppeteer.

As a simple reproducible example, scrape, This is a paragraph from the iframe of this url

https://www.w3schools.com/js/tryit.asp?filename=tryjs_events

Upvotes: 6

Views: 4672

Answers (2)

Gregor Ojstersek
Gregor Ojstersek

Reputation: 1379

I know that this question already has an answer, but if maybe someone wants to go for another approach where you can grab the content from an iframe and use cheerio to traverse over the elements and get the text of any element you want - you can find it here.

Upvotes: 1

Christian Santos
Christian Santos

Reputation: 5456

To scrape an iframe's text in puppeteer, you can use puppeteer's page.evaluate to evaluate JavaScript in the context of the page that returns the iframe's contents.

The steps to do so are:

  1. Grab the iframe Element
  2. Get the iframe's document object.
  3. Use the document object to read the iframe's HTML

I wrote this program that grabs This is a paragraph from the link you provided:

const puppeteer = require("puppeteer");

(async () => {

    const browser = await puppeteer.launch();

    const page = await browser.newPage();
    await page.goto('https://www.w3schools.com/js/tryit.asp?filename=tryjs_events');

    const iframeParagraph = await page.evaluate(() => {

        const iframe = document.getElementById("iframeResult");

        // grab iframe's document object
        const iframeDoc = iframe.contentDocument || iframe.contentWindow.document;

        const iframeP = iframeDoc.getElementById("demo");

        return iframeP.innerHTML;
    });

    console.log(iframeParagraph); // prints "This is a paragraph"

    await browser.close();

})();

Upvotes: 2

Related Questions