Use scrapy from a py file (not in command line)

Question

Trying to launch scrapy from a py file with this command :

py myproject.py -f C:\Users\admin\Downloads	est.csv

Here my file named "myproject.py"

import spiders.ggspider as MySpiders

# Return array
dataFile = args.file
myData = CSVReader.getAnimalList(dataFile)
leSpider = MySpiders.GGCSpider()
leSpider.myList = myData 
leSpider.start_requests()

Here my spider file :

import scrapy
import urllib

class GGSpider(scrapy.Spider):
    name = "spiderman"
    domain = "https://www.google.fr/?q={}"
    myList = []
   
    def __init__(self):
        pass

    def start_requests(self):
        for leObject in self.myList:
            tmpURL = self.domain.format(urllib.parse.urlencode({'text' : leObject[0]}))
            yield scrapy.Request(url=self.domain+leObject[0],callback = self.parse)

    def parse(self, response):
        print('hello')
        print(response)

My problem is : I go into start_requests, because I put a print before the yield and got the print in console But the callback seems to not append (I don't get the 'Hello' print).

I really don't know why (I'm new to Python, maybe I'm missing something obvious)

madbird · Accepted Answer

I guess that's because generator doesn't actually runs before you'll retrieve its values. You could try to consume generator somehow:

import spiders.ggspider as MySpiders

# Return array
dataFile = args.file
myData = CSVReader.getAnimalList(dataFile)
leSpider = MySpiders.GGCSpider()
leSpider.myList = myData 

for request in leSpider.start_requests():
    do_something(request)

UPD: Here is a better example of running Spider from a script:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess(settings={
    "FEEDS": {
        "items.json": {"format": "json"},
    },
})

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished

Use scrapy from a py file (not in command line)

Answers (1)

Related Questions