Jayex Designs
Jayex Designs

Reputation: 13

Getting the html of a page that has a loading screen

I'm trying to get the html of an instagram profile page but when I use the requests library it gets the html of the loading screen and I want the html of the page after loading. This is my code:

from bs4 import BeautifulSoup
import requests

source = requests.get("https://www.instagram.com/ethieen/").text
soup = BeautifulSoup(source,"lxml")
body = soup.find("body")

print(body.prettify())

Upvotes: 1

Views: 220

Answers (2)

Ahmed ElMetwally
Ahmed ElMetwally

Reputation: 2383

This page load by js (ajax). you can do this with puppeteer

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://www.instagram.com/ethieen', {waitUntil: 'networkidle2'});
  await page.pdf({path: 'hn.pdf', format: 'A4'});

  await browser.close();
})();

Upvotes: 0

capek
capek

Reputation: 271

The side probably uses JavaScript, so you want be able to access it with BeautifulSoup since it does not support JavaScript.

To test this you can deactivate JS in your browser and then navigate to that page. The things that are loaded are the things you can access via BeautifulSoup.

Upvotes: 1

Related Questions