WoooHaaaa
WoooHaaaa

Reputation: 20440

Weird python error when using lxml and xpath

I'm using python to write a crawler, since I need to parse html so I import lxml but it comes out an wierd error:

<type 'dict'>
{'xpath': '//ul[@id="i-detail"]/li[1]', 'name': u'\u6807\u9898'}

<type 'dict'>
{'xpath': '//ul[@id="i-detail"]/li[1]', 'name': u'\u6807\u9898'}

<type 'dict'>   
{'xpath': '//ul[@id="i-detail"]/li[1]', 'name': u'\u6807\u9898'}
Exception in thread Thread-3:
Traceback (most recent call last):
  File     "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line     522, in __bootstrap_inner
    self.run()
  File     "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line     477, in run
    self.__target(*self.__args, **self.__kwargs)
  File "fetcher.py", line 78, in run
    self.extractContent(html)
  File "fetcher.py", line 151, in extractContent
    m = tree.xpath(c['xpath'])
AttributeError: 'NoneType' object has no attribute 'xpath'

<type 'dict'>
{'xpath': '//ul[@id="i-detail"]/li[1]', 'name': u'\u6807\u9898'}

Here's a piece of my code:

for c in self.contents:
  print type(c)
  print c
  m = tree.xpath(c['xpath'])

Please help me with these two questions:

  1. Why the type is dict but the error says NoneType ?

  2. I'm tring to match something in the "tree", but it doesn't work (The website is encoded under GBK, could the encoding type cause this kind of problems ?).

Upvotes: 0

Views: 944

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1121148

  1. You are getting an AttributeError, which means that tree has no xpath attribute as it has become None, not that c has no xpath key, that'd be a KeyError instead.

    Clearly we are missing some code here, where tree is set to `None.

  2. You are not printing the result of your tree.xpath() calls, so there is nothing in your code (as shared with us here) that prints m. The tree.xpath() calls could be working fine for all we know.

Reading between the lines and speculating a little, you are assigning the result of tree.xpath() back to tree, and your XPath expression didn't match anything and returned None. The next time into the loop, you now have None instead of an ElementTreeNode, so the xpath() call fails with an AttributeError.

Upvotes: 1

BrenBarn
BrenBarn

Reputation: 251345

For your first question, the error is telling you that tree is None, since that's what you're trying to read the xpath attribute of. But you are printing the type of c, not tree.

I can't understand what you're asking with your second question.

Upvotes: 0

Related Questions