Reputation: 3084
I would like to select the following text:
Bold normal Italics
I need to select and get: Bold normal italist.
The html is:
<a href=""><strong>Bold</strong> normal <i>Italist</i></a>
However, a/text()
yields
normal
only. Does anyone know a fix? I'm testing bing crawling, and the bold text is in different position depending on the query.
Upvotes: 1
Views: 100
Reputation: 52665
You can try to use
a/string()
or
normalize-space(a)
which returns Bold normal Italist
Upvotes: 3
Reputation: 2594
You can use a//text()
instead of a/text()
to get all text items.
# -*- coding: utf-8 -*-
from scrapy.selector import Selector
doc = """
<a href=""><strong>Bold</strong> normal <i>Italist</i></a>
"""
sel = Selector(text=doc, type="html")
result = sel.xpath('//a/text()').extract()
print result
# >>> [u' normal ']
result = u''.join(sel.xpath('//a//text()').extract())
print result
# >>> Bold normal Italist
Upvotes: 3