GRS
GRS

Reputation: 3084

Scrapy: How to get a correct selector

I would like to select the following text:

Bold normal Italics

I need to select and get: Bold normal italist.

The html is:

<a href=""><strong>Bold</strong> normal <i>Italist</i></a>

However, a/text() yields

normal

only. Does anyone know a fix? I'm testing bing crawling, and the bold text is in different position depending on the query.

Upvotes: 1

Views: 100

Answers (2)

Andersson
Andersson

Reputation: 52665

You can try to use

a/string()

or

normalize-space(a)

which returns Bold normal Italist

Upvotes: 3

Frank Martin
Frank Martin

Reputation: 2594

You can use a//text() instead of a/text() to get all text items.

# -*- coding: utf-8 -*-
from scrapy.selector import Selector

doc = """
<a href=""><strong>Bold</strong> normal <i>Italist</i></a>
"""

sel = Selector(text=doc, type="html")

result = sel.xpath('//a/text()').extract()
print result
# >>> [u' normal ']

result = u''.join(sel.xpath('//a//text()').extract())
print result
# >>> Bold normal Italist

Upvotes: 3

Related Questions