Reputation: 1592
I am scraping a link http://gaana.com/. I want get the list of editor pick albums but I am unable to scrape this link don't know what is wrong in my code. My spider code:
import scrapy
from tutorial.items import GannaItem
class GannaSpider(scrapy.Spider):
name = 'gannaspider'
start_urls = ["http://www.songspk.link/"]
def parse(self, response):
for sel in response.xpath('/html/body'):
item = GannaItem()
item['Albumname'] = sel.xpath('div[4]/div[4]/div[2]/div[1]/div[5]/div/ul/li[1]/div/div[2]/a[1]/span/text()').extract()
item['link'] = sel.xpath('div[4]/div[4]/div[2]/div[1]/div[3]/div/div[2]/div/ul/li[1]/div/div[2]/a/@href').extract()
yield item
And I am getting the output
{'Albumname': [], 'link': []}
Upvotes: 0
Views: 121
Reputation: 1233
There are a couple of problems in your code.
Your Xpath paths are quite complicated. You probably generated them with a tool like Portia or the like. I would rather go with class names. As I explained here indices (like div[4]
) should be avoided to make your Xpath expressions more robust. I radically reduced the complexity bz using class names, which makes it easier to debug them.
If you are using nested selectors )like you are doing with your for loop), you subsequently have to use relative paths (starting with ./
) as explained here.
This code will do what you want:
import scrapy
from tutorial.items import GannaItem
class GannaSpider(scrapy.Spider):
name = 'gannaspider'
start_urls = ["http://www.songspk.link/"]
def parse(self, response):
for sel in response.xpath('//ul[@class="songs-list1"]/li[not(@class="title violett")]'):
item = GannaItem()
item['Albumname'] = sel.xpath('.//a[@class="link"]//text()').extract()
item['link'] = sel.xpath('.//a[@class="link"]/@href').extract()
yield item
Upvotes: 2