user2598356
user2598356

Reputation: 123

automatization of web browsing with JavaScripts under python

I'm looking for a package/way to automatize web browsing. For example, I have these results of the search (sorry for Russian): http://www.consultant.ru/search/?q=N+145-%D0%A4%D0%97+%D0%BE%D1%82+31.07.1998

I want to retrieve a value of the variable “item.n” (line 399) from python? It looks like it’s an internal variable of the Javascript function “onSearchLoaded” but if you put the mouse pointer on the result of the search you will see that n=160111 - that’s the value of item.n I’m trying to get What are the packages in python that could help me to do that?

Upvotes: 1

Views: 88

Answers (1)

user3960432
user3960432

Reputation:

You don't have to extract the javascript variable itself, just where it uses that variable. In this case it is placed in the href of the results back from the search.

There a bunch of different libraries you can use for automation. It depends on the level of automation you wish to see. In my case, I prefer to use selenium for these types of automation. Couple that with the core python module regex and you can create a basic example. I was able to write a quick mockup using selenium:

from selenium import webdriver
import re

url = "http://www.consultant.ru/search/?q=N+145-%D0%A4%D0%97+%D0%BE%D1%82+31.07.1998"
pattern = re.compile("n=(\d+)")
xpath = '//div[@id = "baseSrch"]//a'

browser = webdriver.Firefox()
page = browser.get(url)
elements = browser.find_elements_by_xpath(xpath)
browser.close()

for element in elements:
    match = re.search(pattern, element.get_attribute("href"))
    if match:
        print match.group(1)

Which yields:

160111

However this isn't the only way, you could also substitute this with urllib, requests, lxml, etc.. There are a bunch of different methods with which you can extract the information.

Upvotes: 2

Related Questions