fat_potato
fat_potato

Reputation: 653

signal only works in main thread: scrappy

I am making an api which return the JsonResponse as my text from the scrapy. When i run the scripts individually it runs perfectly. But when i try to integrate the scrapy script with python django i am not getting the output.

What i want is only return the response to the request(which in my case is POSTMAN POST request.

Here is the code which i am trying

from django.http import HttpResponse, JsonResponse
from django.views.decorators.csrf import csrf_exempt
import scrapy
from scrapy.crawler import CrawlerProcess


@csrf_exempt
def some_view(request, username):
    process = CrawlerProcess({
        'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
        'LOG_ENABLED': 'false'
    })
    process_test = process.crawl(QuotesSpider)
    process.start()

    return JsonResponse({'return': process_test})


class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        urls = [
            'http://quotes.toscrape.com/random',
        ]
        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self, response):
        return response.css('.text::text').extract_first()

I am very new to python and django stuff.Any kind of help would be much appreciated.

Upvotes: 1

Views: 2228

Answers (1)

Apalala
Apalala

Reputation: 9224

In your code, process_test is a CrawlerProcess, not the output from the crawling.

You need additional configuration to make your spider store its output "somewhere". See this SO Q&A about writing a custom pipeline.

If you just want to synchronously retrieve and parse a single page, you may be better off using requests to retrieve the page, and parsel to parse it.

Upvotes: 0

Related Questions