xpath: how to extract text before, AND within, AND after the element

Question

I am working on a Scrapy spider, in which xpath is used to extract information needed. The source page was first generated by using the website's search function. For example, my interest is to get the items with "computer" in the title. On the source page, all the "computer" is in bold because of the search process. And "computer" could be in the beginning, or the middle or the end of the titles. Some items don't have "computer" in the title. See the examples below:

Example 1: ("computer" at the beginning)

 Computer 
, used
  

Example 2: ("computer" in the middle)

Low price
 computer 
, great deal
 

Example 3: ("computer" at the end)

Don't miss this
 Computer 


Example 4: (no keyword of "computer")

Best laptop deal ever!

The xpath code I tried .//a[@class="title"]/text() will only generate the portion AFTER the strong element. For the above 4 examples, I will get the following results:

Example 1:
, used

Example 2:
, great deal

Example 3: (Nothing)


Example 4:
Best laptop deal ever!

I need a xpath code to cover all these four situation and collect the full titles of each item.

alecxe · Accepted Answer

The simplest approach would be to search for all "text" nodes and "join" them:

"".join(response.xpath('.//a[@class="title"]//text()').extract())

Note the double slash before the text() this is the key fix here.

xpath: how to extract text before, AND within, AND after the <strong> element

Answers (1)

Related Questions

xpath: how to extract text before, AND within, AND after the &lt;strong&gt; element

Answers (1)

Related Questions

xpath: how to extract text before, AND within, AND after the <strong> element