Reputation: 11
I'm trying to crawl the detail of product of this webpage https://www.goo-net.com/php/search/summary.php by scrapy-selenium.
Because I want to crawl the detail information of each product, I crawled all url of product from the page. Then I use callback method to parse it into another def to crawl all the information of that url.
But I try a lot of solutions but my output always not showing anything
Here is my code
import scrapy
import selenium
from scrapy_selenium import SeleniumRequest
from selenium.webdriver.common.keys import Keys
class Goonet1Spider(scrapy.Spider):
name = 'goonet1'
def start_requests(self):
yield SeleniumRequest (
url='https://www.goo-net.com/php/search/summary.php',
wait_time=4,
callback=self.parse
)
def parse(self, response):
links = response.xpath("//*[@class='heading_inner']/h3/a")
url_detail = []
for link in links:
url = response.urljoin(link.xpath(".//@href").get())
url_detail.append(url)
for i in url_detail:
yield SeleniumRequest (
url=i,
wait_time=4,
callback=self.parse_item
)
def parse_item(self,response):
base_price = response.xpath("//table[@class='mainData']/tbody/tr[2]/td[1]/span/text()").get()
yield {
'base_price': base_price
}
Here is my settings.py
DOWNLOADER_MIDDLEWARES = {
'scrapy_selenium.SeleniumMiddleware': 800
}
#SELENIUM
from shutil import which
SELENIUM_DRIVER_NAME = 'chrome'
SELENIUM_DRIVER_EXECUTABLE_PATH = which('chromedriver')
SELENIUM_DRIVER_ARGUMENTS=['-headless'] # '--headless' if using chrome instead of firefox
Please help me
Upvotes: -1
Views: 395
Reputation: 193
Add BaseURL to url_detail to complete your link:
def parse(self, response):
links = response.xpath("//*[@class='heading_inner']/h3/a")
url_detail = []
for link in links:
url = response.urljoin(link.xpath(".//@href").get())
url_detail.append(url)
for i in url_detail:
link = "https://www.goo-net.com" + i
yield SeleniumRequest (
url=link,
wait_time=4,
callback=self.parse_item
)
Upvotes: 0