Reputation: 11240
I'm crawling the following page: http://www.worldfootball.net/all_matches/eng-premier-league-2015-2016/
The first parse goes through and should get all the links with scores as the text. I first loop through all the match rows:
for sel in response.xpath('(//table[@class="standard_tabelle"])[1]/tr'):
And then get the links in the 6th column of the table
matchHref = sel.xpath('.//td[6]/a/@href').extract()
This however returns nothing. I tried the same selector in Chrome (with the addition of 'tbody' between table and tr selector) though and I get results. But, if I try the same selector (without the tbody) in scrapy shell, I only get results from the first response.xpath, while nothing with the following link extraction.
I've done a handful of these loops before but this simple thing has me stumped. Is there a better way to debug this? Here is some shell output where I just try and simplify my second selection to just select any td
In [36]: for sel in response.xpath('(//table[@class="standard_tabelle"])[1]/tr'):
....: sel.xpath('.//td')
....:
Nothing. Ideas?
Upvotes: 1
Views: 496
Reputation: 473763
What I would do is to use the fact that these links in the 6th column contain the report
in the href
attribute value. Demo from the shell:
$ scrapy shell "http://www.worldfootball.net/all_matches/eng-premier-league-2015-2016/"
>>> for row in response.xpath('(//table[@class="standard_tabelle"])[1]/tr[not(th)]'):
... print(row.xpath(".//a[contains(@href, 'report')]/@href").extract_first())
...
/report/premier-league-2015-2016-manchester-united-tottenham-hotspur/
/report/premier-league-2015-2016-afc-bournemouth-aston-villa/
/report/premier-league-2015-2016-everton-fc-watford-fc/
...
/report/premier-league-2015-2016-stoke-city-west-ham-united/
/report/premier-league-2015-2016-swansea-city-manchester-city/
/report/premier-league-2015-2016-watford-fc-sunderland-afc/
/report/premier-league-2015-2016-west-bromwich-albion-liverpool-fc/
Also note this part: tr[not(th)]
- this helps to skip header rows with no relevant links.
Upvotes: 1