Reputation: 37
So I've created a spider in scrapy that now successfully targets all the text I want.
How exactly do you execute this spider in another python file? Cause I want to be able to pass it new URLs/store the data it finds within a dictionary and then a dataframe.
Cause at the moment I can only get it to run with terminal command 'scrapy crawl SpiderName'
from scrapy.spiders import Spider
from scrapy_splash import SplashRequest
class SpiderName(Spider):
name = 'SpiderName'
Page = 'https://www.urlname.com'
def start_requests(self):
yield SplashRequest(url=self.Page, callback=self.parse,
endpoint ='render.html',
args={'wait': 0.5},
)
def parse(self, response):
for x in response.css("div.row.list"):
yield {
'Entry': x.css("span[data-bind]::text").getall()
}
Thanks
Upvotes: 1
Views: 1383
Reputation: 142681
In Scrapy doc Common Practices you can see Run Scrapy from a script
import scrapy
from scrapy.crawler import CrawlerProcess
class MySpider(scrapy.Spider):
# ... Your spider definition ...
# ... run it ...
process = CrawlerProcess(settings={ ... })
process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
If you add own __init__
class MySpider(scrapy.Spider):
def __init__(self, urls, *args, **kwargs):
super().__init__(*args, **kwargs)
self.start_urls = urls
then you could run it with urls
as parameter
process.crawl(MySpider, urls=['http://books.toscrape.com/', 'http://quotes.toscrape.com/'])
Upvotes: 1