Reputation: 1396
I'm working on Scrapy for the first time and I can't get this to return anything. Can someone help me understand what I'm doing wrong?
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from idcode.items import StatuteItem
class IdCodeSpider(BaseSpider):
name = "idcode"
allowed_domains = ["idaho.gov"]
start_urls = ["http://legislature.idaho.gov/idstat/Title1/T1CH1SECT1-101.htm"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
item = StatuteItem()
item['title'] = hxs.select("//table/tbody/tr[1]/td[2]/div[2]/div[1]/div[1]/text()").extract()
return item
I know everything else in my project is working because if I add item['title'] = "test"
above return item
it returns "test". So I must have something wrong with my XPath, but I tested that in the Chrome Developer Console and it's working there.
Upvotes: 0
Views: 266
Reputation: 1396
Removing tbody
resolved the issue.
item['title'] = hxs.select("//table/tr[1]/td[2]/div[2]/div[1]/div[1]/text()").extract()
Upvotes: 1
Reputation: 7577
If you want to use the code and not only to create it, you can use Goose project. It is only for text and media but I have used it many times and I don't have any problem.
Here is the link:
https://github.com/grangier/python-goose
Upvotes: 0