Reputation: 160
so I have the following Scrapy Spider in spiders.py
import scrapy
class TwitchSpider(scrapy.Spider):
name = "clips"
def start_requests(self):
urls = [
f'https://www.twitch.tv/wilbursoot/clips?filter=clips&range=7d'
]
def parse(self, response):
for clip in response.css('.tw-tower'):
yield {
'title': clip.css('::text').get()
}
But the key aspect is that I want to call this spider as a function, in another file, instead of using scrapy crawl quotes
in the console. Where can I read more on this, or whether this is possible at all? I checked through the Scrapy documentation, but I didn't find much
Upvotes: 0
Views: 762
Reputation: 4822
Run the spider from main.py:
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
if __name__ == "__main__":
spider = 'TwitchSpider'
settings = get_project_settings()
# change/update settings:
settings['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
process = CrawlerProcess(settings)
process.crawl(spider)
process.start()
Upvotes: 1
Reputation: 17
Put your other file in the same directory as your spider file. Then import the spider file like
import spider
Then you will have access to the spider file and can make a spider object.
spi = spider()
Then can call functions on that object such as
spi.parse()
This article shows you how to import other python files classes and functions https://csatlas.com/python-import-file-module/
Upvotes: 0