Reputation: 25
Hi everyone, I'm new to web scraping, and I am currently working on scraping Amazon for the price of something, in this case it's just an example (eco dot 3 because thats the first product I found).
I am confused about how to store the data though, as before I only ran the code from the terminal using the scrapy command scrapy crawl Amazon -o amazon.json
. This runes the crawler "Amazon" and stores the output in the json file "amazon.json". I dont actually want to store the data in a file like this thought. What I want is to run the crawler when I run the actual python file. Would I have to create an instance of the Amazon spider? Or maybe somehow run the terminal command with os.system?
Anyway here's the code:
class AmazonSpider(scrapy.Spider):
name = "Amazon"
start_urls = [
'https://www.amazon.de/Echo-Dot-3-Gen-Intelligenter-Lautsprecher-mit-Alexa-Sandstein-Stoff/dp/B07PDHSPXT/ref=sr_1_1?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=3TC0DPYYXLIJW&dchild=1&keywords=echo+dot&qid=1594659298&sprefix=echo%2Caps%2C176&sr=8-1'
]
def parse(self, response):
for price in response.xpath("//td[@class='a-span12']"): #the element in which the price resides
yield {
'price_text': price.xpath("//span[@id='priceblock_ourprice']/text()").get()
#the element of the price tag
}
#Thank you all in advance!!!
Upvotes: 0
Views: 113
Reputation: 17368
import scrapy
from scrapy.crawler import CrawlerProcess
class AmazonSpider(scrapy.Spider):
name = "Amazon"
start_urls = [
'https://www.amazon.de/Echo-Dot-3-Gen-Intelligenter-Lautsprecher-mit-Alexa-Sandstein-Stoff/dp/B07PDHSPXT/ref=sr_1_1?__mk_de_DE=%C3%85M%C3%85%C5%BD%C3%95%C3%91&crid=3TC0DPYYXLIJW&dchild=1&keywords=echo+dot&qid=1594659298&sprefix=echo%2Caps%2C176&sr=8-1'
]
def parse(self, response):
for price in response.xpath("//td[@class='a-span12']"): #the element in which the price resides
yield {
'price_text': price.xpath("//span[@id='priceblock_ourprice']/text()").get()
}
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(AmazonSpider)
process.start()
Upvotes: 1