Reputation:
Im trying to extract data from different 'tables' inside a 'Main Table' on the same page (Same URL). The items fields have the same XPath / same structure in all sub-tables, so the problem I am facing is just to add 'Multiple' XPath for the tables sections on this page.
Here what my code looks like :
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from tutorial.items import TutorialItem
class MySpider(BaseSpider):
name = "test"
allowed_domains = ["blabla.com"]
start_urls = ["http://www.blablabl..com"] // Start_url Doesnt change = Same Page
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = [hxs.select('//tr[@class="index class_tr group-6487"]')]
//Here I would like to have Mltiple XPathSelectors ex:
// titles = [hxs.select('//tr[@class="index class_tr group-6488"]')]
// titles = [hxs.select('//tr[@class="index class_tr group-6489"]')]
// Each for a table section within the same 'Main Table'
items = []
for title in titles:
item = TutorialItem()
item ['name'] = title.select('td[3]/span/a/text()').extract()
item ['encryption'] = title.select('td[5]/text()').extract()
item ['compression'] = title.select('td[8]/text()').extract()
item ['resolution'] = title.select('td[7]/span/text()').extract()
items.append(item)
return items
I would appreciate any hint if this is achievable; If I write a different spider for each table section, then I will end up with 10 spiders for the same URL/table and I am not quite sure if data could be retrieved within the same 'csv' file in order.
Upvotes: 2
Views: 1992
Reputation: 834
Try this:
titles = [hxs.select('//tr[@class="index class_tr group-6487"] | //tr[@class="index class_tr group-6488"] | //tr[@class="index class_tr group-6489"]')]
Upvotes: 2