ninetynine
ninetynine

Reputation: 357

Scrapy: Selector returns full element with .extract (but assigns data correctly)

I have recently started learning Scrapy (and Python for that matter) but have encountered a peculiar issue that so far I have not been able to find an explanation for. I managed to find a workaround (see below), but am curious to understand the reason behind the .extract() behavior.

Running the following in my parse function

item['stops'] = response.xpath('//td[@class="station"]/a[@href]/text()').extract

results in Scrapy saving not the data in the defined output csv, but the full string(?) like so:

<bound method SelectorList.extract of 
[<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'K\xf6ln Hbf'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Siegburg/Bonn'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Frankfurt(M) Flughafen Fernbf'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Mannheim Hbf'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Karlsruhe Hbf'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Offenburg'>,
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Freiburg(Breisgau) Hbf'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Basel Bad Bf'>, 
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Basel SBB'>]>

Data is correctly assigned but doesn't get passed through as such to the element. Other functions that run with .re() instead of .extract() work fine. Surprisingly, also the above query works fine if I run it as follows

item['stops'] = response.xpath('//td[@class="station"]/a[@href]/text()').re('.*')

Upvotes: 1

Views: 1481

Answers (1)

Goran
Goran

Reputation: 6824

Hope it helps

sel = Selector(response)
item['stops'] = sel.xpath('//td[@class="station"]/a/@href").extract()[0]

Upvotes: 1

Related Questions