Ilias Tsoumas
Ilias Tsoumas

Reputation: 41

HTML frame scraping using Python

We're trying to access an HTML page and get its content using Python. When it boils down to frame loading we face some problems. The code is:

URL = "http://192.168.1.48/_pnt_log.html"
    username = "11111"
    password ="1"

    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    login_data = urllib.urlencode({'username' : username, 'j_password' : password})
    try:
        opener.open('http://192.168.1.48/_top.html', login_data)
        resp = opener.open('http://192.168.1.48/_dept.html?dn=1')

The HTML received is the following:

<html>
<head>
 <meta http-equiv="content-type" content="text/html;charset=iso-8859-1">
 <title>Remote UI<Additional Functions>:  : imageRUNNER2520</title>
</head>
<frameset cols="175,*" bordercolor="white" border="0" framespacing="0" frameborder="0">
 <frame src="index06_02.html" name="Menu" scrolling="AUTO" noresize>
 <frame src="dept.html?dn=1" name="body" noresize>
 <noframes>
  <body bgcolor="white">
  </body>
 </noframes>
</frameset>
</html>

I want the content on dept.html?dn=1 which is not loaded with this request. Is there any way to get the content like a broswer does?

Upvotes: 0

Views: 1154

Answers (1)

Ilias Tsoumas
Ilias Tsoumas

Reputation: 41

Finnaly the "problem" was on the how canon printer page keep the cookies and how open-close the sessions with urllib2.

I solved the problem used selenium python lib. http://selenium-python.readthedocs.io/

With selenium i take the html from browser and slide over the permisions problem because i work on the same session through the browser.

from selenium import webdriver

##OPEN BROSWER##
driver = webdriver.Firefox()
##LOGIN##
driver.get("http://192.168.1.48/_top.html")
driver.find_element_by_name('user_name').send_keys("11111")
driver.find_element_by_name('pwd').send_keys("1")
driver.find_element_by_xpath("/html/body/form/center/p[1]/table/tbody/tr[2]/td/table/tbody/tr[3]/td/table/tbody/tr/td/table/tbody/tr[13]/td[3]/a/img").click()
driver.get("http://192.168.1.48/dept.html?dn=1")
##GET HTML##
elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

##SAVE HTML##
f = open('/home/itsoum/PrinterProject/html_source_code.html', 'w')
f.write(source_code.encode('utf-8'))
f.close()

driver.quit()

Upvotes: 1

Related Questions