Extract Outer Text with Scrappy

Question

I need to parse following fragment:

    Lekhwiya v Zobahan

or

    Sepahan v Al Nasr (UAE)

as Lekhwiya v Zobahan and <Sepahan v Al' Nasr'(UAE) properly.

I was trying to parse as:

team_1 = block.xpath('.//span/text()').extract()[:2]
team_1 = team_1[0].strip() + team_1[1].strip() 
team_2 = block.xpath('.//span/strong/text()').extract()[0]

item['match'] = team_2.strip() + ' ' + team_1 if team_1[0] == 'v' else team_1 + ' ' + team_2.strip()

As for me, it's ugly solution. What is the best approach to do it?

paul trmbrth · Accepted Answer

You can use XPath's string() function, or normalize-space() even:

In [1]: text = '''
   ...:     Lekhwiya v Zobahan
   ...:     Sepahan v Al Nasr (UAE)
   ...: '''

In [2]: import scrapy

In [3]: selector = scrapy.Selector(text=text, type="html")

In [4]: for span in selector.xpath('//span'):
   ...:     print(span.xpath('string(.)').extract_first())
   ...:     
    Lekhwiya v Zobahan
    Sepahan v Al Nasr (UAE)

In [5]: for span in selector.xpath('//span'):
    print(span.xpath('normalize-space(.)').extract_first())
   ...:     
Lekhwiya v Zobahan
Sepahan v Al Nasr (UAE)

Extract Outer Text with Scrappy

Answers (1)

Related Questions