neiii
neiii

Reputation: 160

Python Scrapy -> Use a scrapy spider as a function

so I have the following Scrapy Spider in spiders.py

import scrapy 

class TwitchSpider(scrapy.Spider):
  name = "clips"

  def start_requests(self):
      urls = [
          f'https://www.twitch.tv/wilbursoot/clips?filter=clips&range=7d'
      ]

  def parse(self, response): 
    for clip in response.css('.tw-tower'):
      yield {
        'title': clip.css('::text').get()
      }

But the key aspect is that I want to call this spider as a function, in another file, instead of using scrapy crawl quotes in the console. Where can I read more on this, or whether this is possible at all? I checked through the Scrapy documentation, but I didn't find much

Upvotes: 0

Views: 762

Answers (2)

SuperUser
SuperUser

Reputation: 4822

Run the spider from main.py:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

if __name__ == "__main__":
    spider = 'TwitchSpider'
    settings = get_project_settings()
    # change/update settings:
    settings['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
    process = CrawlerProcess(settings)
    process.crawl(spider)
    process.start()

Run scrapy from a script.

Upvotes: 1

Sam
Sam

Reputation: 17

Put your other file in the same directory as your spider file. Then import the spider file like

import spider

Then you will have access to the spider file and can make a spider object.

spi = spider()

Then can call functions on that object such as

spi.parse()

This article shows you how to import other python files classes and functions https://csatlas.com/python-import-file-module/

Upvotes: 0

Related Questions