python lxml xpath AttributeError (NoneType) with correct xpath and usually working

Question

I am trying to migrate a forum to phpbb3 with python/xpath. Although I am pretty new to python and xpath, it is going well. However, I need help with an error.

(The source file has been downloaded and processed with tagsoup.)

Firefox/Firebug show xpath: /html/body/table[5]/tbody/tr[position()>1]/td/a[3]/b

(in my script without tbody)

Here is an abbreviated version of my code:

forumfile="morethread-alte-korken-fruchtweinkeller-89069-6046822-0.html"
XPOSTS = "/html/body/table[5]/tr[position()>1]"
t = etree.parse(forumfile)
allposts = t.xpath(XPOSTS)

XUSER = "td[1]/a[3]/b"
XREG = "td/span"
XTIME = "td[2]/table/tr/td[1]/span"
XTEXT = "td[2]/p"
XSIG = "td[2]/i"
XAVAT = "td/img[last()]"

XPOSTITEL = "/html/body/table[3]/tr/td/table/tr/td/div/h3"
XSUBF = "/html/body/table[3]/tr/td/table/tr/td/div/strong[position()=1]"

for p in allposts:
    unreg=0
    username = None
    username = p.find(XUSER).text          #this is where it goes haywire

When the loop hits user "tompson" / position()=11 at the end of the file, I get

AttributeError: 'NoneType' object has no attribute 'text'

I've tried a lot of try except else finallys, but they weren't helpful.

I am getting much more information later in the script such as date of post, date of user registry, the url and attributes of the avatar, the content of the post...

The script works for hundreds of other files/sites of this forum.

This is no encode/decode problem. And it is not "limited" to the XUSER part. I tried to "hardcode" the username, then the date of registry will fail. If I skip those, the text of the post (code see below) will fail...

#text of getpost
text = etree.tostring(p.find(XTEXT),pretty_print=True)

Now, this whole error would make sense if my xpath would be wrong. However, all the other files and the first numbers of users in this file work. it is only this "one" at position()=11

Is position() uncapable of going >10 ? I don't think so? Am I missing something?

python lxml xpath AttributeError (NoneType) with correct xpath and usually working

Answers (1)

Related Questions