Reputation: 13
I'm trying to get the html of an instagram profile page but when I use the requests library it gets the html of the loading screen and I want the html of the page after loading. This is my code:
from bs4 import BeautifulSoup
import requests
source = requests.get("https://www.instagram.com/ethieen/").text
soup = BeautifulSoup(source,"lxml")
body = soup.find("body")
print(body.prettify())
Upvotes: 1
Views: 220
Reputation: 2383
This page load by js (ajax). you can do this with puppeteer
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.instagram.com/ethieen', {waitUntil: 'networkidle2'});
await page.pdf({path: 'hn.pdf', format: 'A4'});
await browser.close();
})();
Upvotes: 0
Reputation: 271
The side probably uses JavaScript, so you want be able to access it with BeautifulSoup since it does not support JavaScript.
To test this you can deactivate JS in your browser and then navigate to that page. The things that are loaded are the things you can access via BeautifulSoup.
Upvotes: 1