hyeri
hyeri

Reputation: 693

My Scrapy is not Scraping anything (blank csv file)

I'm trying to scrap top 100 t20 batsmen from icc site however the csv file I'm getting is blank. There are no errors in my code (at least I don't know about them). Here is my item file

import scrapy

class DmozItem(scrapy.Item):
    Ranking = scrapy.Field()
    Rating = scrapy.Field()
    Name = scrapy.Field()
    Nationality = scrapy.Field()
    Carer_Best_Rating = scrapy.Field()

dmoz_spider file

import scrapy

from tutorial.items import DmozItem

class DmozSpider(scrapy.Spider):
    name = "espn"
    allowed_domains = ["relianceiccrankings.com"]
    start_urls = ["http://www.relianceiccrankings.com/ranking/t20/batting/"]

    def parse(self, response):
        #sel = response.selector
        #for tr in sel.css("table.top100table>tbody>tr"):
        for tr in response.xpath('//table[@class="top100table"]/tr'):
            item = DmozItem()
            item['Ranking'] = tr.xpath('//td[@class="top100id"]/text()').extract_first()
            item['Rating'] = tr.xpath('//td[@class="top100rating"]/text()').extract_first()
            item['Name'] = tr.xpath('td[@class="top100name"]/a/text()').extract_first()
            item['Nationality'] = tr.xpath('//td[@class="top100nation"]/text()').extract_first()
            item['Carer_Best_Rating'] = tr.xpath('//td[@class="top100cbr"]/text()').extract_first()
            yield item

what is wrong with my code?

Upvotes: 0

Views: 460

Answers (2)

Steve
Steve

Reputation: 976

To answer your ranking problem, the xpath for Ranking starts with '//...' which means 'from the start of the page'. You need it to be relative to tr instead. Simply remove the '//' from every xpath in the for loop.

item['Ranking'] = tr.xpath('td[@class="top100id"]/text()').extract_first()

Upvotes: 0

Rafael Almeida
Rafael Almeida

Reputation: 5240

The website you're trying to scrap had a frame in it which is the one you want to scrap.

start_urls = [
    "http://www.relianceiccrankings.com/ranking/t20/batting/"
]

This is the correct URL

Also there is a lot more stuff wrong going on,

  • To select elements you should use the response itself, you don't need to initiate a variable with response.selector just select it straight from response.xpath(//foo/bar)

  • Your css selector for the table is wrong. top100table is a class rather than an id therefore is should be .top100table and not #top100table.

Here just have the xpath for it:

response.xpath("//table[@class='top100table']/tr")

tbody isn't part of the html code, it only appears when you inspect with a modern browser.

  • The extract() method always returns a list rather then the element itself so you need to extract the first element you find like this:

item['Ranking'] = tr.xpath('td[@class="top100id"]/a/text()').extract_first()

Hope this helps, have fun scraping!

Upvotes: 2

Related Questions