user_78361084
user_78361084

Reputation: 3928

Fill out javascript with python?

I am trying to parse an html page but I need to filter the results before I parse the page.

For instance, 'http://www.ksl.com/index.php?nid=443' is a classified listing of cars in Utah. Instead of parsing ALL the cars, I'd like to filter it first (ie find all BMWs) and then only parse those pages. Is it possible to fill in a javascript form with python?

Here's what I have so far:

import urllib

content = urllib.urlopen('http://www.ksl.com/index.php?nid=443').read()
f = open('/var/www/bmw.html',"w")
f.write(content)
f.close()

Upvotes: 1

Views: 932

Answers (2)

daedalus
daedalus

Reputation: 10923

Here is the way to do it. First download the page, scrape it to find the models that you are looking for, then you can get links to the new pages to scrape. There is no need for javascript here. This model and the BeautifulSoup documentation will get you going.

from BeautifulSoup import BeautifulSoup
import urllib2

base_url = 'http://www.ksl.com'
url = base_url + '/index.php?nid=443'
model = "Honda" # this is the name of the model to look for

# Load the page and process with BeautifulSoup
handle = urllib2.urlopen(url)
html = handle.read()
soup = BeautifulSoup(html)

# Collect all the ad detail boxes from the page
divs = soup.findAll(attrs={"class" : "detailBox"})

# For each ad, get the title
# if it contains the word "Honda", get the link
for div in divs:
    title = div.find(attrs={"class" : "adTitle"}).text
    if model in title:
        link = div.find(attrs={"class" : "listlink"})["href"]
        link = base_url + link
        # Now you have a link that you can download and scrape
        print title, link
    else:
        print "No match: ", title

At the moment of answering, this code snippet is looking for Honda models and returns the following:

1995-  Honda Prelude http://www.ksl.com/index.php?sid=0&nid=443&tab=list/view&ad=8817797
No match:  1994-  Ford Escort
No match:  2006-  Land Rover Range Rover Sport
No match:  2006-  Nissan Maxima
No match:  1957-  Volvo 544
No match:  1996-  Subaru Legacy
No match:  2005-  Mazda Mazda6
No match:  1995-  Chevrolet Monte Carlo
2002-  Honda Accord http://www.ksl.com/index.php?sid=0&nid=443&tab=list/view&ad=8817784
No match:  2004-  Chevrolet Suburban (Chevrolet)
1998-  Honda Civic http://www.ksl.com/index.php?sid=0&nid=443&tab=list/view&ad=8817779
No match:  2004-  Nissan Titan
2001-  Honda Accord http://www.ksl.com/index.php?sid=0&nid=443&tab=list/view&ad=8817770
No match:  1999-  GMC Yukon
No match:  2007-  Toyota Tacoma

Upvotes: 2

aldux
aldux

Reputation: 2804

If you're using python, Beautifull Soup is what you're looking for.

Upvotes: -1

Related Questions