Reputation: 103
I am newbie using xpath
,
I wanna extract every single title, body, link , release date from this link
everthing seems okay, but no on body, how to extract every single body on nested xPath, thanks before :)
here my source
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from thehack.items import ThehackItem
class MySpider(BaseSpider):
name = "thehack"
allowed_domains = ["thehackernews.com"]
start_urls = ["http://thehackernews.com/search/label/mobile%20hacking"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.xpath('//article[@class="post item module"]')
items = []
for titles in titles:
item = ThehackItem()
item['title'] = titles.select('span/h2/a/text()').extract()
item['link'] = titles.select('span/h2/a/@href').extract()
item['body'] = titles.select('span/div/div/div/div/a/div/text()').extract()
item['date'] = titles.select('span/div/span/text()').extract()
items.append(item)
return items
anybody can fix about body blok? only on body...
thanks before mate
here the picture of inspection elements from the website
Upvotes: 0
Views: 533
Reputation: 1548
I think you where struggling with the selectors, right? I think you should check the documentation for selectors, there's a lot of good information there. In this particular example, using the css selectors, I think it would be something like:
class MySpider(scrapy.Spider):
name = "thehack"
allowed_domains = ["thehackernews.com"]
start_urls = ["http://thehackernews.com/search/label/mobile%20hacking"]
def parse(self, response):
for article in response.css('article.post'):
item = ThehackItem()
item['title'] = article.css('.post-title>a::text').extract_first()
item['link'] = article.css('.post-title>a::attr(href)').extract_first()
item['body'] = ''. join(article.css('[id^=summary] *::text').extract()).strip()
item['date'] = article.css('[itemprop="datePublished"]::attr(content)').extract_first()
yield item
It would be a good exercise for you to change them to xpath selectors and maybe also check about ItemLoaders, together are very useful.
Upvotes: 1