Reputation: 357
I have recently started learning Scrapy (and Python for that matter) but have encountered a peculiar issue that so far I have not been able to find an explanation for. I managed to find a workaround (see below), but am curious to understand the reason behind the .extract() behavior.
Running the following in my parse function
item['stops'] = response.xpath('//td[@class="station"]/a[@href]/text()').extract
results in Scrapy saving not the data in the defined output csv, but the full string(?) like so:
<bound method SelectorList.extract of
[<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'K\xf6ln Hbf'>,
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Siegburg/Bonn'>,
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Frankfurt(M) Flughafen Fernbf'>,
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Mannheim Hbf'>,
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Karlsruhe Hbf'>,
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Offenburg'>,
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Freiburg(Breisgau) Hbf'>,
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Basel Bad Bf'>,
<Selector xpath='//td[@class="station"]/a[@href]/text()' data=u'Basel SBB'>]>
Data is correctly assigned but doesn't get passed through as such to the element. Other functions that run with .re() instead of .extract() work fine. Surprisingly, also the above query works fine if I run it as follows
item['stops'] = response.xpath('//td[@class="station"]/a[@href]/text()').re('.*')
Upvotes: 1
Views: 1481
Reputation: 6824
Hope it helps
sel = Selector(response)
item['stops'] = sel.xpath('//td[@class="station"]/a/@href").extract()[0]
Upvotes: 1