Reputation: 18745
I'm trying to get a HTML code which is probably dynamically generated. The only thing I'm trying to is to get a html code of the next page. If you click on the button, everything works perfect of course. But if you check the href of this page and just copy paste it into your browsers address form and submit, you get a text which seems like this:
{"paging":{"isLastPage":false},"pagination":{"firstUrl":"/sk/komponenty/aktivne-prvky/analogove-obvody/spustacie-obvody/c/cat-L3D_525255/showmore?q=*&filter_Buyable=1&filter_Category4=Sp%C3%BA%C5%A1%C5%A5acie+obvody&filter_Category3=Anal%C3%B3gov%C3%A9+obvody&useTechnicalView=true&pageSize=10&page=1","prevUrl":"/sk/komponenty/aktivne-prvky/analogove-obvody/spustacie-obvody/c/cat-L3D_525255/showmore?
The same thing happens when you try to do a request with it's headers.
The thing is that I want to get a HTML code of the page you get when you click on the next page button down here: http://www.distrelec.sk/sk/komponenty/aktivne-prvky/analogove-obvody/spustacie-obvody/c/cat-L3D_525255
do you know how to get a HTML code?
EDIT: I've tried to find a GET which calls the next page and use requests module to simulate the click (with all request headers) but I got the same result. No HTML.
Upvotes: 2
Views: 450
Reputation: 6277
You have to understand how HTML of page 2 is formed.
The secret is not just to get JSON of 'content 2' but also to properly insert it (replace with it 'content 1') into main HTML. Sure, the certain JavaScript is responsible for decoding JSON and updating/replacing content.
You need:
You sure have lots of jobs. Being an expert in JS. :-)
Upvotes: 0
Reputation: 91
I can't produce the JSON result, but this worked for me using BeautifulSoup.
import urllib2
from bs4 import BeautifulSoup
url = #that url
html = urllib2.urlopen(url)
soup = BeautifulSoup(html)
# this gives you the specific next link
next_link_tags = soup.find_all('a','btn btn-right js-page-link')
next_link_url = "http://www.distrelec.sk" + next_link_tags[0]['href']
html2 = BeautifulSoup(urllib2.urlopen(next_link_url))
print html2
Upvotes: 1
Reputation: 319
Your getting the JSON which probably helps generate the dynamically the next page. If you just want to see the html of the next page, just right click and select "Inspect Element", in Google Chrome anyways, after the page loads.
But if you want the URL of the next page, inside the JSON it references the URLs.
Upvotes: 1