Reputation: 25
I am new to coding in python (maybe a couple of days in) and basically learning of other people's code on stackoverflow. The code I am trying to write uses beautifulsoup to get the pid and the corresponding price for motorcycles on craigslist. I know there are many other ways of doing this but my current code looks like this:
from bs4 import BeautifulSoup
from urllib2 import urlopen
u = ""
count = 0
while (count < 9):
site = "http://sfbay.craigslist.org/mca/" + str(u)
html = urlopen(site)
soup = BeautifulSoup(html)
postings = soup('p',{"class":"row"})
f = open("pid.txt", "a")
for post in postings:
x = post.getText()
y = post['data-pid']
prices = post.findAll("span", {"class":"itempp"})
if prices == "":
w = 0
else:
z = str(prices)
z = z[:-8]
w = z[24:]
filewrite = str(count) + " " + str(y) + " " +str(w) + '\n'
print y
print w
f.write(filewrite)
count = count + 1
index = 100 * count
print "index is" + str(index)
u = "index" + str(index) + ".html"
It works fine and as I keep learning i plan to optimize it. The problem I have right now, is that entries without price are still showing up. Is there something obvious that I am missing. thanks.
Upvotes: 2
Views: 1138
Reputation: 7233
The problem is how you're comparing prices
. You say:
prices = post.findAll("span", {"class":"itempp"})
In BS .findAll
returns a list of elements. When you're comparing price to an empty string, it will always return false.
>>>[] == ""
False
Change if prices == "":
to if prices == []
and everything should be fine.
I hope this helps.
Upvotes: 3