Milano
Milano

Reputation: 18745

Can't get HTML from GET request

I'm trying to get a HTML code which is probably dynamically generated. The only thing I'm trying to is to get a html code of the next page. If you click on the button, everything works perfect of course. But if you check the href of this page and just copy paste it into your browsers address form and submit, you get a text which seems like this:

{"paging":{"isLastPage":false},"pagination":{"firstUrl":"/sk/komponenty/aktivne-prvky/analogove-obvody/spustacie-obvody/c/cat-L3D_525255/showmore?q=*&filter_Buyable=1&filter_Category4=Sp%C3%BA%C5%A1%C5%A5acie+obvody&filter_Category3=Anal%C3%B3gov%C3%A9+obvody&useTechnicalView=true&pageSize=10&page=1","prevUrl":"/sk/komponenty/aktivne-prvky/analogove-obvody/spustacie-obvody/c/cat-L3D_525255/showmore? 

The same thing happens when you try to do a request with it's headers.

The thing is that I want to get a HTML code of the page you get when you click on the next page button down here: http://www.distrelec.sk/sk/komponenty/aktivne-prvky/analogove-obvody/spustacie-obvody/c/cat-L3D_525255

do you know how to get a HTML code?

EDIT: I've tried to find a GET which calls the next page and use requests module to simulate the click (with all request headers) but I got the same result. No HTML.

Upvotes: 2

Views: 450

Answers (3)

Igor Savinkin
Igor Savinkin

Reputation: 6277

You have to understand how HTML of page 2 is formed. enter image description here The secret is not just to get JSON of 'content 2' but also to properly insert it (replace with it 'content 1') into main HTML. Sure, the certain JavaScript is responsible for decoding JSON and updating/replacing content. You need:

  • find it out what functions work to replace 'content 1' into 'content 2' (thru examining HTML and js-scripts) and what particulary they do.
  • have your original HTML
  • get JSON (as you've done)
  • simulate replacement of original HTML as string with any python/other language means. Use regex for this. Or if you can make HTML a DOM structure, use XPath.

You sure have lots of jobs. Being an expert in JS. :-)

Upvotes: 0

psvann
psvann

Reputation: 91

I can't produce the JSON result, but this worked for me using BeautifulSoup.

import urllib2 
from bs4 import BeautifulSoup

url = #that url
html = urllib2.urlopen(url)
soup = BeautifulSoup(html)

# this gives you the specific next link
next_link_tags = soup.find_all('a','btn btn-right js-page-link')
next_link_url = "http://www.distrelec.sk" + next_link_tags[0]['href']

html2 = BeautifulSoup(urllib2.urlopen(next_link_url))
print html2

Upvotes: 1

Spilker22
Spilker22

Reputation: 319

Your getting the JSON which probably helps generate the dynamically the next page. If you just want to see the html of the next page, just right click and select "Inspect Element", in Google Chrome anyways, after the page loads.

But if you want the URL of the next page, inside the JSON it references the URLs.

Upvotes: 1

Related Questions