Rudy
Rudy

Reputation: 25

url errors in beautiful soup

I am trying to obtain data-PID and price from Craigslist using beautifulsoup. I have written a separate code which gives me the file CLallsites.txt. In this code I am trying to grab each of those sites from the txt file and get the PIDs of all entries in the first 10 pages. My code is:

  from bs4 import BeautifulSoup       
  from urllib2 import urlopen 
  readfile = open("CLallsites.txt")
  product = "mcy"
  while 1:
    u = ""
    count = 0
    line = readfile.readline()
    commaposition = line.find(',')
    site = line[0:commaposition]
    location = line[commaposition+1:]
    site_filename = location + '.txt'
    f = open(site_filename, "a")
    while (count < 10):
       sitenow = site + "\\" + product + "\\" + str(u)
       html = urlopen(str(sitenow))                      
       soup = BeautifulSoup(html)                
       postings = soup('p',{"class":"row"})
       for post in postings:
            y = post['data-pid']
            print y
       count = count +1
       index = count*100
       u = "index" + str(index) + ".html"
    if not line:
        break
    pass             

My CLallsites.txt looks like this:

craiglist site, location (Stackoverflow does not allow posting with cragslist links so I cannot show the text, I could try to attach the text file if that helps.)

when I run the code I get the following error:

Traceback (most recent call last):

File "reading.py", line 16, in html = urlopen(str(sitenow))

File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout)

File "/usr/lib/python2.7/urllib2.py", line 400, in open response = self._open(req, data)

File "/usr/lib/python2.7/urllib2.py", line 418, in _open '_open', req)

File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain result = func(*args)

File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open return self.do_open(httplib.HTTPConnection, req)

File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open raise URLError(err)

urllib2.URLError:

Any ideas about what I am doing wrong?

Upvotes: 0

Views: 533

Answers (1)

A. Rodas
A. Rodas

Reputation: 20689

I don't know what is the content of sitenow, but it looks like it is an invalid URL. Note that URLs use slashes and not backslashes (so the statement sould be something similar to sitenow = site + "/" + product + "/" + str(u))

Upvotes: 0

Related Questions