Reputation: 267
I am trying to scrape a page using the code below. When I run the code I get an error on the first assignment to the titles variable. The error is: AttributeError: 'NonType' object has no attribute 'split'.
If I simply replace the assignment with print(tag.text) it works as expected. Also the second assignment to the commmands variable works as expected. Why is the first assignment generating the error?
Code:
import requests
import lxml.html as LH
s = requests.Session()
r = s.get('http://www.rebootuser.com/?page_id=1721')
root = LH.fromstring(r.text)
def getTags():
commands = []
titles = []
for tag in root.xpath('//*/tr/td[@width="54%"]/span'):
titles += tag.text.split(',')
for tag in root.xpath('//*/td/span/code'):
commands += tag.text.split(',')
zipped = zip(titles, commands)
for item in zipped:
print item
getTags()
Upvotes: 1
Views: 577
Reputation: 368894
In the document, some tags that match xpath //*/tr/td[@width="54%"]/span
contain b
tag as child instead of text.
Accessing text attribute of such tag return None
.
>>> None.split(',')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'split'
Use text_content
method instead of text
attribute to correctly get text content for such tag (and its children):
for tag in root.xpath('/tr/td[@width="54%"]/span'):
#titles += tag.text.split(',')
titles += tag.text_content().split(',')
Upvotes: 1