D_P
D_P

Reputation: 862

Scheduling my crawler with celery not working

Here I want to run my crawler with celery every 1 minute. I write the tasks as below and called the task in the view with delay but I am not getting the result.

I run celery -A mysite worker -l info celery , rabbitmq broker , scrapy and django server in different terminals. The CrawlerHomeView redirects to the task list successfully by creating the task object.But the celery is not working

It is throwing this error in the celery console ValueError: not enough values to unpack (expected 3, got 0) [2020-06-08 15:36:06,732: INFO/MainProcess] Received task: crawler.tasks.schedule_task[3b537143-caa8-4445-b3d6-c0bc8d301b89] [2020-06-08 15:36:06,735: ERROR/MainProcess] Task handler raised error: ValueError('not enough values to unpack (expected 3, got 0)') Traceback (most recent call last): File "....\venv\lib\site-packages\billiard\pool.py", line 362, in workloop result = (True, prepare_result(fun(*args, **kwargs))) File "....\venv\lib\site-packages\celery\app\trace.py", line 600, in _fast_trace_task tasks, accept, hostname = _loc ValueError: not enough values to unpack (expected 3, got 0)

views

class CrawlerHomeView(LoginRequiredMixin, View):
    login_url = 'users:login'

    def get(self, request, *args, **kwargs):
        frequency = Task()
        categories = Category.objects.all()
        targets = TargetSite.objects.all()
        keywords = Keyword.objects.all()

        form = CreateTaskForm()
        context = {
            'targets': targets,
            'keywords': keywords,
            'frequency': frequency,
            'form':form,
            'categories': categories,
        }
        return render(request, 'index.html', context)
    
    def post(self, request, *args, **kwargs):
        
        form = CreateTaskForm(request.POST)
        if form.is_valid():
            unique_id = str(uuid4()) # create a unique ID. 
            obj = form.save(commit=False)
            obj.created_by = request.user
            obj.unique_id = unique_id
            obj.status = 0
            obj.save()
            form.save_m2m()       
            schedule_task.delay(obj.pk)
        return render(request, 'index.html', {'form':form, 'errors':form.errors})

tasks.py

scrapyd = ScrapydAPI('http://localhost:6800')
@periodic_task(run_every=crontab(minute=1))  # how to do  with task search_frequency value ?
def schedule_task(pk):
    task = Task.objects.get(pk=pk)
    if task.status == 0 or task.status == 1 and not datetime.date.today() >= task.scraping_end_date:
        unique_id = str(uuid4())  # create a unique ID.
        keywords = ''
        # for keys in ast.literal_eval(obj.keywords.all()): #keywords change to csv
        for keys in task.keywords.all():
            if keywords:
                keywords += ', ' + keys.title
            else:
                keywords += keys.title

        settings = {
            'spider_count': len(task.targets.all()),
            'keywords': keywords,
            'unique_id': unique_id,  # unique ID for each record for DB
            'USER_AGENT': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
        }

        # res = ast.literal_eval(ini_list)

        for site_url in task.targets.all():
            domain = urlparse(site_url.address).netloc  # parse the url and extract the domain
            spider_name = domain.replace('.com', '')
            scrapyd.schedule('default', spider_name, settings=settings, url=site_url.address, domain=domain,
                                    keywords=keywords)

    elif task.scraping_end_date == datetime.date.today():
        task.status = 2
        task.save()  # change the task status as completed.

settings

CELERY_BROKER_URL = 'amqp://localhost'

EDIT

This answer helped me to find the solution Celery raises ValueError: not enough values to unpack.

Now this errors has gone. Now in the celery console I am seeing this [2020-06-08 16:33:23,123: INFO/MainProcess] Task crawler.tasks.schedule_task[0578558d-0dc6-4db7-b69f-e912b604ff3d] succeeded in 0.016000000000531145s: None and getting no scraped results in my frontend .

Now my question is how can I check that my task is running periodically every 1 minute ?

It is the very first time I am using celery so here might be some problems.

Upvotes: 0

Views: 277

Answers (1)

iklinac
iklinac

Reputation: 15738

Celery is no longer supported on Windows as platform ( version 4 dropped official support )

I highly suggest that you dockerize your app instead (or use wsl2),if you don't want to go this route You would probably need to use gevent ( notice there could be some additional problems if you go this route)

pip install gevent
celery -A <module> worker -l info -P gevent

found similar detailed answer here

Upvotes: 1

Related Questions