Reputation: 329
I am using a code to scrape a PDF to generate a relevant dictionary. My code works when I access each text block individually, i.e
x = scraperwiki.pdftoxml(u.read())
r = lxml.etree.fromstring(x)
s = r.xpath('//page[@number="142"]/text[@left = "134"]')
print s[8].text
print s[0],s[1].. all seem to work but when I try the same for
x = scraperwiki.pdftoxml(u.read())
r = lxml.etree.fromstring(x)
s = r.xpath('//page[@number="142"]/text[@left = "134"]')
print s[0:8].text
I get this error: AttributeError: 'list' object has no attribute 'text'
Can anyone tell me what's wrong?
Upvotes: 1
Views: 3130
Reputation: 368894
text
is an attribute of each element, not of the list.
Iterate each elements.
x = scraperwiki.pdftoxml(u.read())
r = lxml.etree.fromstring(x)
s = r.xpath('//page[@number="142"]/text[@left = "134"]')
for elem in s[:8]:
print elem.text
or use list comprehension:
x = scraperwiki.pdftoxml(u.read())
r = lxml.etree.fromstring(x)
s = r.xpath('//page[@number="142"]/text[@left = "134"]')
print [elem.text for elem in s[:8]]
Upvotes: 1