Reputation: 80
Here's the code I'll be working with (I'm using scrapy)
def start_requests(self):
start_urls = ['https://www.lowes.com/search?searchTerm=8654RM-42']
This is where I'm storing all my URLS
Here is how I'm trying to only print everything after the '='
productSKU = response.url.split("=")[-1]
item["productSKU"] = productSKU
Here is the output:
{'productPrice': '1,449.95',
'productSKU': 'https://www.lowes.com/pd/ZLINE-KITCHEN-BATH-Ducted-Red-Matte-Wall-Mounted-Range-Hood-Common-42-Inch-Actual-42-in/1001440644'}
So now here's the problem:
The URLs I'm inputting will eventually be populated with
https://www.lowes.com/search?searchTerm = {something}
and that's why I would like to use {something} to ensure I'll have every item that I attempted to scrape on the CSV (for sorting and matching purposes).
The URL I'm using redirects to me this URL:
(Input)https://www.lowes.com/search?searchTerm=8654RM-42
->
And so, my output for productSKU is the entire redirect URL instead of just whatever is after the '=' sign. The output I would like would be 8654RM-42.
And here is my whole program
# -*- coding: utf-8 -*-
import scrapy
from ..items import LowesspiderItem
from scrapy.http import Request
class LowesSpider(scrapy.Spider):
name = 'lowes'
def start_requests(self):
start_urls = ['https://www.lowes.com/search?searchTerm=8654RM-42']
for url in start_urls:
yield Request(url, cookies={'sn':'2333'}) #Added cookie to bypass location req
def parse(self, response):
items = response.css('.grid-container')
for product in items:
item = LowesspiderItem()
#get product price
productPrice = product.css('.art-pd-price::text').get()
productSKU = response.url.split("=")[-1]
item["productSKU"] = productSKU
item["productPrice"] = productPrice
yield item
Upvotes: 0
Views: 213
Reputation: 1487
you need to use meta
to pass in the input url like this
def start_requests(self):
start_urls = ['https://www.lowes.com/search?searchTerm=8654RM-42']
for url in start_urls:
yield Request(url, cookies={'sn':'2333'},meta={'url':url)
def parse(self,response):
url = response.meta['url'] #your input url
Upvotes: 3