Neo Mosaid
Neo Mosaid

Reputation: 419

How to get html source code of a web page

I was using curl to scrape html code from a certain website. then they changed their server settings and curl no longer can get the page content giving error code 1020 then I changed my script to use elinks.

but again they are now using cloudflare and elinks no longer works (only in this particular website). and it gives the same error code 1020.

is there any command line or option to use other browsers (firefox,chromium, google-chrome...) and get the page html in a terminal ?

Upvotes: 1

Views: 1129

Answers (2)

Roma N
Roma N

Reputation: 320

I bring to your attention the code and libraries that bypass protection cloudflare:

Libs:

npm i puppeteer-extra puppeteer-extra-plugin-stealth puppeteer

nodejs:

const puppeteer = require('puppeteer-extra')
const pluginStealth = require('puppeteer-extra-plugin-stealth')
const { executablePath } = require('puppeteer')

const link = 'https://www.g2.com/'

const getHtmlThoughCloudflare = async (url) => {
  puppeteer.use(pluginStealth())
  const result = await puppeteer
    .launch({ headless: true })
    .then(async (browser) => {
      const page = await browser.newPage()
      await page.goto(url)
      const html = await page.content()
      await browser.close()
      return html
    })

  console.log(` HTML: ${result}`)
  return result // html
}

getHtmlThoughCloudflare(link)

Upvotes: 0

vsemozhebuty
vsemozhebuty

Reputation: 13822

If you can write scripts for Node.js, here is a small example using puppeteer library. It logs page source code after the page is loaded in a headless (invisible) Chrome, with dynamic content generated by page scripts:

import puppeteer from 'puppeteer';

const browser = await puppeteer.launch({ headless: false, defaultViewport: null });

try {
  const [page] = await browser.pages();
  await page.goto('https://example.org/');
  console.log(await page.content());

} catch (err) { console.error(err); } finally { await browser.close(); }

Upvotes: 1

Related Questions