Reputation: 123
I'm looking for a package/way to automatize web browsing. For example, I have these results of the search (sorry for Russian): http://www.consultant.ru/search/?q=N+145-%D0%A4%D0%97+%D0%BE%D1%82+31.07.1998
I want to retrieve a value of the variable “item.n” (line 399) from python? It looks like it’s an internal variable of the Javascript function “onSearchLoaded” but if you put the mouse pointer on the result of the search you will see that n=160111 - that’s the value of item.n I’m trying to get What are the packages in python that could help me to do that?
Upvotes: 1
Views: 88
Reputation:
You don't have to extract the javascript variable itself, just where it uses that variable. In this case it is placed in the href of the results back from the search.
There a bunch of different libraries you can use for automation. It depends on the level of automation you wish to see. In my case, I prefer to use selenium for these types of automation. Couple that with the core python module regex and you can create a basic example. I was able to write a quick mockup using selenium:
from selenium import webdriver
import re
url = "http://www.consultant.ru/search/?q=N+145-%D0%A4%D0%97+%D0%BE%D1%82+31.07.1998"
pattern = re.compile("n=(\d+)")
xpath = '//div[@id = "baseSrch"]//a'
browser = webdriver.Firefox()
page = browser.get(url)
elements = browser.find_elements_by_xpath(xpath)
browser.close()
for element in elements:
match = re.search(pattern, element.get_attribute("href"))
if match:
print match.group(1)
Which yields:
160111
However this isn't the only way, you could also substitute this with urllib, requests, lxml, etc.. There are a bunch of different methods with which you can extract the information.
Upvotes: 2