Rudy
Rudy

Reputation: 25

Using beautifulsoup to get prices from craigslist

I am new to coding in python (maybe a couple of days in) and basically learning of other people's code on stackoverflow. The code I am trying to write uses beautifulsoup to get the pid and the corresponding price for motorcycles on craigslist. I know there are many other ways of doing this but my current code looks like this:

from bs4 import BeautifulSoup         
from urllib2 import urlopen               
u = ""
count = 0
while (count < 9):
    site = "http://sfbay.craigslist.org/mca/" + str(u)
    html = urlopen(site)                      
    soup = BeautifulSoup(html)                
    postings = soup('p',{"class":"row"})                      
    f = open("pid.txt", "a")
    for post in postings:
        x = post.getText()
        y = post['data-pid']
        prices = post.findAll("span", {"class":"itempp"})
        if prices == "":
            w = 0
        else:
            z = str(prices)
            z = z[:-8]
            w = z[24:]
        filewrite = str(count) + " " + str(y) + " " +str(w) + '\n'
        print y
        print w
        f.write(filewrite)
    count = count + 1 
    index = 100 * count
    print "index is" + str(index)
    u = "index" + str(index) + ".html"

It works fine and as I keep learning i plan to optimize it. The problem I have right now, is that entries without price are still showing up. Is there something obvious that I am missing. thanks.

Upvotes: 2

Views: 1138

Answers (1)

That1Guy
That1Guy

Reputation: 7233

The problem is how you're comparing prices. You say:

prices = post.findAll("span", {"class":"itempp"})

In BS .findAll returns a list of elements. When you're comparing price to an empty string, it will always return false.

>>>[] == ""
False

Change if prices == "": to if prices == [] and everything should be fine.

I hope this helps.

Upvotes: 3

Related Questions