David Folkner
David Folkner

Reputation: 1199

How to use urllib to fill out forms and gather data?

I come from a world of scientific computing and number crunching.

I am trying to interact with the internet to compile data so I don't have to. One task it to auto-fill out searches on Marriott.com so I can see what the best deals are all on my own.

I've attempted something simple like

import urllib
import urllib2

url = "http://marriott.com"


values  = {'Location':'New York'}
data = urllib.urlencode(values)

website = urllib2.Request(url, data)
response = urllib2.urlopen(website)
stuff = response.read()
f = open('test.html','w')
f.write(stuff)

My questions are the following:

  1. How do you know how the website receives information?
    How do I know a simple "Post" will work?
  2. If it is simple, how do I know what the names of the dictionary should be for "Values?"
  3. How to check if it's working? The write lines at the end are an attempt for me to see if my inputs are working properly but that is insufficient.

Upvotes: 1

Views: 2435

Answers (3)

shantanoo
shantanoo

Reputation: 3704

You may also have a look at splinter, where urllib may not be useful (JS, AJAX, etc.) For finding out the form parameters firebug could be useful.

Upvotes: 1

Serial
Serial

Reputation: 8043

  1. What i do to check is use a Web-debugging proxy to view the request you send first send a real request with your browser and compare that request to the request that your script sends. try to make the two requests match

    What I use for this is Charles Proxy

    Another way is view the html file you saved (in this case test.html) and view it in your browser and compare this to the actual request reponse

  2. To findout what the dictionary should have in it is look at the page source of the page and find out the names of the forms your trying to fill. in you're case the "location"should actually be "destinationAddress.destination"

    Here is a picture: Name

    So look in the HTML code to get the names of the forms and that is what should be in the dictionary. i know that Google Chrome and Mozilla Firefox both have tools to view the structure of the html (in the Picture i used inspect element in Google Chrome)

for more info on urllib2 read here

I really hope this helps :)

Upvotes: 1

user2618501
user2618501

Reputation:

You need to read and analyze the HTML code of the related side. Every browser has decent tools for introspecting the DOM of a site, analyzing the network traffic and requests.

Usually you want to use the mechanize module for performing automatized interactions with a web site. There is no guarantee given that this will work in every case. Nowadays many websites use AJAX or more complex client-side programming making it hard to "emulate" a human user using Python.

Apart from that: the mariott.com site does not contain an input field "Location"...so you are guessing URL parameters with having analyzed their forms and functionality?

Upvotes: 1

Related Questions