Reputation: 5770
I parse a page has 20 href to next page. Like this:
But one of them doesn't has the href
It will cause my code fail.
i = 1000
j = 0
dataLen = len(response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]'))
photoNodes = response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]')
for photoNode in photoNodes:
contentHref = photoNode.xpath('.//a/@href').extract_first()
yield Request(contentHref, callback=self.parse_page, priority = i, dont_filter=True)
i -= 1
j += 1
# start parse next page
def parse_page(self, response):
global countLen, dataLen
enName = response.xpath('//*[@class="movie_intro_info_r"]/h3/text()').extract_first()
cnName = response.xpath('//*[@class="movie_intro_info_r"]/h1/text()'
...
I try to add if not (photoNode is None):
or if not photoNode ==""
still not working.
i = 1000
j = 0
dataLen = len(response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]'))
photoNodes = response.xpath('//div[@class="rank_list table rankstyle1"]//div[@class="tr"]')
for photoNode in photoNodes:
if not (photoNode is None):
contentHref = photoNode.xpath('.//a/@href').extract_first()
# photoHref = photoNode.xpath('.//a/img/@src').extract_first()
yield Request(contentHref, callback=self.parse_page, priority = i, dont_filter=True)
i -= 1
j += 1
else:
pass
twRanking['movie'] = movieArray
I have no idea how to skip it if it may not has a href
.
Any help would be appreciated. Thanks in advance.
Upvotes: 0
Views: 116
Reputation: 3057
Seems, that you need to check if contentHref
is not empty, not photoNode
. photoNode
anyway will contain information, so it will not be empty. Try something like this:
for photoNode in photoNodes:
contentHref = photoNode.xpath('.//a/@href').extract_first()
if contentHref:
# photoHref = photoNode.xpath('.//a/img/@src').extract_first()
yield Request(contentHref, callback=self.parse_page, priority = i, dont_filter=True)
i -= 1
j += 1
else:
pass
Upvotes: 2