xpath get the text from multi lines

Question

I have this html

REGEN REAL ESTATE, Dubai – U.A.E

RERA ID: 12087

Specialist Licensed Property Brokers & Consultants
Residential / Commercial – Buying, Selling, R ...Read more...

I want to get all the text inside the td

what i have tried?

normalize-space(td/text())

but I got only last line.

what should I do to get all the lines?

paul trmbrth · Accepted Answer

You can use u"".join(selector.xpath('.//td//text()').extract()) or u"".join(selector.css('td ::text').extract())

I almost forgot the most simple way, if you want every text content of a specific node, you can use normalize-space() on it directly:

paul@wheezy:~$ ipython
Python 2.7.3 (default, Jan  2 2013, 13:56:14) 
Type "copyright", "credits" or "license" for more information.

IPython 0.13.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: from scrapy.selector import Selector

In [2]: selector = Selector(text="""REGEN REAL ESTATE, Dubai – U.A.E
   ...: 
   ...: RERA ID: 12087
   ...: 
   ...: Specialist Licensed Property Brokers & Consultants
   ...: Residential / Commercial – Buying, Selling, R ...Read more...""", type="html")

In [3]: selector.xpath("normalize-space(.//td)")
Out[3]: []

In [4]: selector.xpath("normalize-space(.//td)").extract()
Out[4]: [u'REGEN REAL ESTATE, Dubai \u2013 U.A.E RERA ID: 12087 Specialist Licensed Property Brokers & Consultants Residential / Commercial \u2013 Buying, Selling, R ...Read more...']

In [5]: [td.xpath("normalize-space(.)").extract() for td in selector.css("td")]
Out[5]: [[u'REGEN REAL ESTATE, Dubai \u2013 U.A.E RERA ID: 12087 Specialist Licensed Property Brokers & Consultants Residential / Commercial \u2013 Buying, Selling, R ...Read more...']]

In [7]:

Remember normalize-space() will consider only the 1st node in the node-set you give as argument, so it usually does what you want if you are sure your argument will match one and only one node you want.

xpath get the text from multi lines

what i have tried?

Answers (2)

Related Questions