Aman Kumar
Aman Kumar

Reputation: 1592

How to scrape the songs using scrapy

I am scraping a link http://gaana.com/. I want get the list of editor pick albums but I am unable to scrape this link don't know what is wrong in my code. My spider code:

import scrapy
from tutorial.items import GannaItem


class GannaSpider(scrapy.Spider):
    name = 'gannaspider'
    start_urls = ["http://www.songspk.link/"]

    def parse(self, response):
        for sel in response.xpath('/html/body'):
            item = GannaItem()
            item['Albumname'] = sel.xpath('div[4]/div[4]/div[2]/div[1]/div[5]/div/ul/li[1]/div/div[2]/a[1]/span/text()').extract()
            item['link'] = sel.xpath('div[4]/div[4]/div[2]/div[1]/div[3]/div/div[2]/div/ul/li[1]/div/div[2]/a/@href').extract()
        yield item

And I am getting the output

{'Albumname': [], 'link': []}

Upvotes: 0

Views: 121

Answers (1)

dron22
dron22

Reputation: 1233

There are a couple of problems in your code.

  1. Your Xpath paths are quite complicated. You probably generated them with a tool like Portia or the like. I would rather go with class names. As I explained here indices (like div[4]) should be avoided to make your Xpath expressions more robust. I radically reduced the complexity bz using class names, which makes it easier to debug them.

  2. If you are using nested selectors )like you are doing with your for loop), you subsequently have to use relative paths (starting with ./) as explained here.

This code will do what you want:

import scrapy
from tutorial.items import GannaItem


class GannaSpider(scrapy.Spider):
    name = 'gannaspider'
    start_urls = ["http://www.songspk.link/"]

    def parse(self, response):
        for sel in response.xpath('//ul[@class="songs-list1"]/li[not(@class="title violett")]'):
            item = GannaItem()
            item['Albumname'] = sel.xpath('.//a[@class="link"]//text()').extract()
            item['link'] = sel.xpath('.//a[@class="link"]/@href').extract()
            yield item

Upvotes: 2

Related Questions