Jtenc
Jtenc

Reputation: 19

Scrapy extracting content from HTML no output

So basically I want to pull the parts under the tr-mfgPartNumber class from this html but have problems.

Below is the HTML

<tbody id="lnkPart" cookie-tracking="ref_page_event=Select Part;available_parameters=[&quot;s&quot;,&quot;pv1989&quot;,&quot;pv142&quot;,&quot;pv2042&quot;,&quot;pv2192&quot;,&quot;pv276&quot;,&quot;pv252&quot;,&quot;pv16&quot;,&quot;pv1291&quot;];">

<tr>
    
    <td class="tr-compareParts" align="center">
        <input type="checkbox" name="part" value="428-3574-2-ND" id="428-3574-2-ND" onclick="partClick();">
        <label title="Compare Parts" for="428-3574-2-ND"></label>
    </td>
    
    <td class="tr-datasheet">
            <a class="lnkDatasheet" href="https://www.cypress.com/file/43021/download" target="_blank" track-data="ref_page_event=Display Asset;page_title=Datasheet;asset_type=Datasheet">
                <img class="datasheet-img" src="//www.digikey.com/Web%20Export/Common/icons/datasheet.png" alt="CY62157EV30LL-45ZSXIT Datasheet" title="CY62157EV30LL-45ZSXIT Datasheet">
            </a>
    </td>
    
    <td class="tr-image">
        <a href="/product-detail/en/cypress-semiconductor-corp/CY62157EV30LL-45ZSXIT/428-3574-2-ND/1205268">
            <img class="pszoomer" zoomimg="//media.digikey.com/Renders/Cypress%20Semi%20Renders/428;51-85087;Z,ZS;44.jpg" border="0" height="64" src="//media.digikey.com/Renders/Cypress%20Semi%20Renders/428;51-85087;Z,ZS;44_tmb.jpg" alt="CY62157EV30LL-45ZSXIT - Cypress Semiconductor Corp" title="CY62157EV30LL-45ZSXIT - Cypress Semiconductor Corp">
        </a>
    </td>
    
    <td class="tr-dkPartNumber nowrap-culture">                             
        <a href="/product-detail/en/cypress-semiconductor-corp/CY62157EV30LL-45ZSXIT/428-3574-2-ND/1205268">
            428-3574-2-ND
        </a>
            <div class="product-indicator-collection">
<a class="align-indicator-collection" href="javascript:msgBox('#dlgRohs');">
<img class="rohs-foilage" src="//www.digikey.com/web%20export/common/mkt/en/leaf.png" border="0" alt="This part is RoHS compliant." title="This part is RoHS compliant.">
</a>


</div>
       
    </td>
    
    <td class="tr-mfgPartNumber">
        <a href="/product-detail/en/cypress-semiconductor-corp/CY62157EV30LL-45ZSXIT/428-3574-2-ND/1205268">
            <span>CY62157EV30LL-45ZSXIT</span>
        </a>
    </td>

Upvotes: 0

Views: 76

Answers (1)

muhallilahnaf
muhallilahnaf

Reputation: 50

When I tried the same code, scrapy was getting empty response. Maybe the site was detecting and blocking the spider. After using user agent, it worked.

Here's the code below (I also changed "tbody.InkPart" to "tbody#lnkPart", it was a syntax mistake in your code, though it is not needed since there's only one tbody tag):

import scrapy


class DigiSpider(scrapy.Spider):
    name = 'digi'
    allowed_domains = ['digikey.com']
    custom_settings = {
        "USER_AGENT": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
    }
    start_urls = ['https://www.digikey.com/products/en/integrated-circuits-ics/memory/774?FV=-1%7C428%2C-8%7C774%2C7%7C1/']

    def parse(self, response):
        data={}
        parts=response.css('tbody#lnkPart')
        for part in parts:
            for p in part.css('td.tr-mfgPartNumber'):
                data['href'] = p.css('a::attr(href)').extract()
                yield data 

Upvotes: 1

Related Questions