Robert
Robert

Reputation: 11

How can I use Python to log in to a website and perform actions in it?

These are the steps I need to automatize:

1) Log in

2) Select an option from a drop down menu (To acces a list of products)

3) search something on the search field (The product we are looking for)

4) click a link (To open up the product's options)

5) click another link(To compile all the .pdf files relevant to said product in a bigger .pdf)

6) wait for a .pdf to load and then download it.(Save the .pdf on my machine with the name of the product as the file name)

I want to know if this is possible. If it is, where can I find how to do it?

Upvotes: 0

Views: 3878

Answers (3)

aychedee
aychedee

Reputation: 25639

Sure, just use selenium webdriver

from selenium import webdriver
browser = webdriver.Chrome()

browser.get('http://your-website.com')
search_box = browser.find_element_by_css_selector('input[id=search]')

search_box.send_keys('my search term')
browser.find_element_by_css_selector('input[type=submit']).click()

That would get you through the visit page, enter search term, click on search, stage of your problem. Read through the api for the rest.

Mechanize has problems at the moment because so much of a web page is generated via javascript. And if it is not rendering that you can't do much with the page.

It helps if you understand css selectors, else you can find elements by id, or xpath or other things...

Upvotes: 0

For static sites you can use the mechanize module, available from PyPi, it does everything you want - except it does not run Javascript and thus does not work on dynamic websites. Also it is strictly Python 2 only.

easy_install mechanize

For something way more complicated you might have to use python bindings for Selenium (install instructions) to control an external browser; or use spynner that embeds a web browser. However these 2 are far more difficult to set up.

Upvotes: 1

Dylan
Dylan

Reputation: 292

Is it pivotal that there is actual clicking involved? If you're just looking to download PDFs then I suggest you use the Requests library. You might also want to consider using Scrapy.

In terms of searching on the site, you may want to use Fiddler to capture the HTTP POST request and then replicate that in Python.

Here is some code that might be useful as a starting place - these functions would login to a server and download a target file.

def login():
    login_url = 'http://www.example.com'
    payload = 'usr=username&pwd=password'
    connection = requests.Session()
    post_login = connection.post(data=payload,
        url=login_url,
        headers=main_headers,
        proxies=proxies,
        allow_redirects=True)

def download():
    directory = "C:\\example\\"
    url = "http://example.com/download.pdf"
    filename = directory + '\\' + url[url.rfind("/")+1:]
    r = connection.get(url=url,
                       headers=main_headers,
                       proxies=proxies)
    file_size = int(r.headers["Content-Length"])
    block_size = 1024
    mode = 'wb'
    print "\tDownloading: %s [%sKB]" % (filename, int(file_size/1024))
    if r.status_code == 200:
        with open(filename, mode) as f:
            for chunk in r.iter_content(block_size):
                f.write(chunk)

Upvotes: 1

Related Questions