Manuel
Manuel

Reputation: 802

Spider not parsing data once it enters the page

I'm trying scrape amazon's website for products, after achieving a normal scraping process, I tried to add some "complexity" to the program.

My idea was to, from a .txt, recieve certain keywords. With those keywords I used the search bar for getting the products that matched them and scrape the data. That worked just fine.

The problem is that, depending on the keyword, for example, Laptop and Shoes, the parser needs to work differently because shoes have different sizes, colors and such so the data I need to scrape from a "shoe" product is different than the data I need from a "Laptop" product. And that's where I'm at.

With some help of the people this site, I was able to make a different parser be called depending on the word that the spider got from the .txt. The code looks something like this.

def start_requests(self):

    txtfile = open('productosABuscar.txt', 'r')

    keywords = txtfile.readlines()

    txtfile.close()

    for keyword in keywords:

        yield Request(self.search_url.format(keyword))

def parse_item(self,response):
    #Here i get the keyword for comparisson later
    category = re.sub('Back to search results for |"','', response.xpath('normalize-space(//span[contains(@class, "a-list-item")]//a/text())').extract_first()) 
    #Here i get the product url for the next parser
    productURL = response.request.url

    if category == 'Laptop':

        yield response.follow(productUrl, callback = self.parse_laptop)

def parse_laptop(self, response):

    laptop_item = LaptopItem()

    #Parsing things

    yield laptop_item

This should work fine but, when I run the spider from the Anaconda console, no data is scraped. The weird thing is that the spider is actually accessing every "Laptop" item in the amazon page but not scraping the data from it.

In the console, I can see every link the spider is accessing, with the statement, for example

2018-12-27 10:02:36 [scrapy.core.engine] DEBUG: Crawled (200) https://www.amazon.com/Acer-Aspire-i3-8130U-Memory-E5-576-392H/dp/B079TGL2BZ/ref=sr_1_3/ref=sr_1_acs_bss_3_4?ie=UTF8&qid=1545915651&sr=8-3-acs&keywords=Laptop> (referer: https://www.amazon.com/s?field-keywords=Laptop)

Is there something wrong with the arrangement of the parser or is it a deeper issue?

Upvotes: 0

Views: 70

Answers (1)

ThunderMind
ThunderMind

Reputation: 799

does it goes to parse_laptop function ? and if it goes, what do you get ? empty {} or nothing ? or any error ?

Upvotes: 1

Related Questions