Python: Scrapy spider doesn't return results?

Question

I know I need to work on my selectors in order to tune in on more specific data, but I don't know why my csv is EMPTY.

my parse class:

class MySpider(BaseSpider):
    name =  "wikipedia"
    allowed_domains = ["en.wikipedia.org/"]
    start_urls = ["http://en.wikipedia.org/wiki/2014_in_film"]

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.select('//table[@class="wikitable sortable jquery-tablesorter"], [@style="margin:auto; margin:auto;"]')
        items = []
        for title in titles:
            item = WikipediaItem()
            item["title"] = title.select("td/text()").extract()
            item["url"] = title.select("a/text()").extract()
            items.append(item)
        return items

The html I'm trying to crawl:

And this section within the html repeats over and over for each film, so it should grab all once selected correctly:
    
I know the issue isn't in exporting because even in my shell it says "Crawl 0 pages, Scraped 0 Items" so really nothing is getting touched. 

Highest-grossing films of 2014

Rank
Title
Studio
Worldwide gross


1
Transformers: Age of Extinction
Paramount Pictures
$1,091,404,499






    1
    Transformers: Age of Extinction
    Paramount Pictures
    $1,091,404,499

larrywgray · Accepted Answer

The table is not the repeatable element... it is the table row.
You will need to change your code to select the table rows ie
```
titles = hxs.select('//tr')
```

Then loop through them and use xpath to get your data

for title in titles:
    item = WikipediaItem()
    item["title"] = title.xpath("./td/i/a/@title")[0]
    item["url"] = title.xpath("./td/i/a/@href")[0]
    items.append(item)

Python: Scrapy spider doesn't return results?

Answers (1)

Related Questions

Python: Scrapy spider doesn&#39;t return results?

Answers (1)

Related Questions

Python: Scrapy spider doesn't return results?