Reputation: 20440
I'm using python to write a crawler, since I need to parse html so I import lxml but it comes out an wierd error:
<type 'dict'>
{'xpath': '//ul[@id="i-detail"]/li[1]', 'name': u'\u6807\u9898'}
<type 'dict'>
{'xpath': '//ul[@id="i-detail"]/li[1]', 'name': u'\u6807\u9898'}
<type 'dict'>
{'xpath': '//ul[@id="i-detail"]/li[1]', 'name': u'\u6807\u9898'}
Exception in thread Thread-3:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 522, in __bootstrap_inner
self.run()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/threading.py", line 477, in run
self.__target(*self.__args, **self.__kwargs)
File "fetcher.py", line 78, in run
self.extractContent(html)
File "fetcher.py", line 151, in extractContent
m = tree.xpath(c['xpath'])
AttributeError: 'NoneType' object has no attribute 'xpath'
<type 'dict'>
{'xpath': '//ul[@id="i-detail"]/li[1]', 'name': u'\u6807\u9898'}
Here's a piece of my code:
for c in self.contents:
print type(c)
print c
m = tree.xpath(c['xpath'])
Please help me with these two questions:
Why the type is dict
but the error says NoneType ?
I'm tring to match something in the "tree", but it doesn't work (The website is encoded under GBK, could the encoding type cause this kind of problems ?).
Upvotes: 0
Views: 944
Reputation: 1121148
You are getting an AttributeError
, which means that tree
has no xpath
attribute as it has become None
, not that c
has no xpath
key, that'd be a KeyError
instead.
Clearly we are missing some code here, where tree
is set to `None.
You are not printing the result of your tree.xpath()
calls, so there is nothing in your code (as shared with us here) that prints m
. The tree.xpath()
calls could be working fine for all we know.
Reading between the lines and speculating a little, you are assigning the result of tree.xpath()
back to tree
, and your XPath expression didn't match anything and returned None. The next time into the loop, you now have None
instead of an ElementTreeNode
, so the xpath()
call fails with an AttributeError
.
Upvotes: 1
Reputation: 251345
For your first question, the error is telling you that tree
is None, since that's what you're trying to read the xpath
attribute of. But you are printing the type of c
, not tree
.
I can't understand what you're asking with your second question.
Upvotes: 0