flossfan
flossfan

Reputation: 10854

Use Mechanize with select field that is not inside a form?

I would like to use Mechanize (with Python) to submit a form, but unfortunately the page has been badly coded and the <select> element is not actually inside <form> tags.

So I can't use the traditional method via the form:

forms = [f for f in br.forms()]
mycontrol = forms[1].controls[0]

What can I do instead?

Here is the page I would like to scrape, and relevant bit of code - I'm interested in the la select item:

    <fieldset class="searchField">
      <label>By region / local authority</label>
      <p id="regp">
        <label>Region</label>
        <select id="region" name="region"><option></option></select>
      </p>
      <p id="lap">
        <label>Local authority</label>
        <select id="la" name="la"><option></option></select>
      </p>
      <input id="byarea" type="submit" value="Go" />
      <img id="regmap" src="/schools/performance/img/map_england.png" alt="Map of regions in England" border="0" usemap="#England" />
    </fieldset>

Upvotes: 0

Views: 600

Answers (1)

diwhyyyyy
diwhyyyyy

Reputation: 6372

This is actually more complex that you think, but still easy to implement. What is happening is that the webpage you linking is pulling in the local authorities by JSON (which is why the name="la" select element doesn't fill in Mechanize, which lacks Javascript). The easiest way around is to directly ask for this JSON data with Python and use the results to go directly to each data page.

import urllib2
import json

#The URL where we get our array of LA data
GET_LAS = 'http://www.education.gov.uk/cgi-bin/schools/performance/getareas.pl?level=la&code=0'

#The URL which we interpolate the LA ID into to get individual pages
GET_URL = 'http://www.education.gov.uk/schools/performance/geo/la%s_all.html'

def get_performance(la):
    page = urllib2.urlopen(GET_URL % la)
    #print(page.read())

#get the local authority list
las = json.loads(urllib2.urlopen(GET_LAS).read())

for la in las:
    if la != 0:
        print('Processing LA ID #%s (%s)' % (la[0], la[1]))
        get_performance(la[0])

As you can see, you don't even need to load the page you linked or use Mechanize to do it! However, you will still need a way to parse out the school names and then then performance figures.

Upvotes: 1

Related Questions