Scrapy extracting content from HTML no output

Question

So basically I want to pull the parts under the tr-mfgPartNumber class from this html but have problems.

first I thought it was my syntax for calling each class but still no output

Tried adding another for loop to go to the whole body class if anyone can check if my code has an error in the the way im calling the classes that would be great!

import scrapy

class DigiSpider(scrapy.Spider):
    name = 'digi'
    allowed_domains = ['digikey.com']
    start_urls = ['https://www.digikey.com/products/en/integrated-circuits-ics/memory/774?FV=-1%7C428%2C-8%7C774%2C7%7C1/']

    def parse(self, response):
        data={}
        parts=response.css('tbody.InkPart')
        for part in parts:
            for p in part.css('td.tr-mfgPartNumber'):
                data['href'] = p.css('a::attr(href)').extract()
                yield data

Below is the HTML




    
    
        
        
    
    
    
            
                
            
    
    
    
        
            
        
    
    
                                 
        
            428-3574-2-ND
        
            






       
    
    
    
        
            CY62157EV30LL-45ZSXIT

muhallilahnaf · Accepted Answer

When I tried the same code, scrapy was getting empty response. Maybe the site was detecting and blocking the spider. After using user agent, it worked.

Here's the code below (I also changed "tbody.InkPart" to "tbody#lnkPart", it was a syntax mistake in your code, though it is not needed since there's only one tbody tag):

import scrapy


class DigiSpider(scrapy.Spider):
    name = 'digi'
    allowed_domains = ['digikey.com']
    custom_settings = {
        "USER_AGENT": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
    }
    start_urls = ['https://www.digikey.com/products/en/integrated-circuits-ics/memory/774?FV=-1%7C428%2C-8%7C774%2C7%7C1/']

    def parse(self, response):
        data={}
        parts=response.css('tbody#lnkPart')
        for part in parts:
            for p in part.css('td.tr-mfgPartNumber'):
                data['href'] = p.css('a::attr(href)').extract()
                yield data

Scrapy extracting content from HTML no output

Answers (1)

Related Questions