Test Dev
Test Dev

Reputation: 97

ValueError: can only parse strings python

I am trying to gather a bunch of links using xpath which need to be scraped from the next page however, I keep getting the error that can only parse strings? I tried looking at the type of lk and it was a string after I casted it? What seems to be wrong?

 def unicode_to_string(types):
   try:
      types = unicodedata.normalize("NFKD", types).encode('ascii', 'ignore')
      return types
  except:
      return types

def getData():
  req = "http://analytical360.com/access-points"
  page = urllib2.urlopen(req)
  tree = etree.HTML(page.read())  
  i = 0
   for lk in tree.xpath('//a[@class="sabai-file sabai-file-image sabai-file-type-jpg "]//@href'):
      print "Scraping Vendor #" + str(i)
      trees = etree.HTML(urllib2.urlopen(unicode_to_string(lk))) 
      for ll in trees.xpath('//table[@id="archived"]//tr//td//a//@href'):
         final = etree.HTML(urllib2.urlopen(unicode_to_string(ll)))

Upvotes: 1

Views: 7098

Answers (1)

jgritty
jgritty

Reputation: 11935

You should pass in strings not urllib2.orlopen.

Perhaps change the code like so:

trees = etree.HTML(urllib2.urlopen(unicode_to_string(lk)).read()) 
    for i, ll in enumerate(trees.xpath('//table[@id="archived"]//tr//td//a//@href')):
        final = etree.HTML(urllib2.urlopen(unicode_to_string(ll)).read())

Also, you don't seem to increment i.

Upvotes: 1

Related Questions