Reputation: 862
Here I want to run my crawler with celery every 1 minute. I write the tasks as below and called the task in the view with delay
but I am not getting the result.
I run celery -A mysite worker -l info
celery , rabbitmq broker , scrapy and django server in different terminals.
The CrawlerHomeView redirects to the task list successfully by creating the task object.But the celery is not working
It is throwing this error in the celery console
ValueError: not enough values to unpack (expected 3, got 0) [2020-06-08 15:36:06,732: INFO/MainProcess] Received task: crawler.tasks.schedule_task[3b537143-caa8-4445-b3d6-c0bc8d301b89] [2020-06-08 15:36:06,735: ERROR/MainProcess] Task handler raised error: ValueError('not enough values to unpack (expected 3, got 0)') Traceback (most recent call last): File "....\venv\lib\site-packages\billiard\pool.py", line 362, in workloop result = (True, prepare_result(fun(*args, **kwargs))) File "....\venv\lib\site-packages\celery\app\trace.py", line 600, in _fast_trace_task tasks, accept, hostname = _loc ValueError: not enough values to unpack (expected 3, got 0)
views
class CrawlerHomeView(LoginRequiredMixin, View):
login_url = 'users:login'
def get(self, request, *args, **kwargs):
frequency = Task()
categories = Category.objects.all()
targets = TargetSite.objects.all()
keywords = Keyword.objects.all()
form = CreateTaskForm()
context = {
'targets': targets,
'keywords': keywords,
'frequency': frequency,
'form':form,
'categories': categories,
}
return render(request, 'index.html', context)
def post(self, request, *args, **kwargs):
form = CreateTaskForm(request.POST)
if form.is_valid():
unique_id = str(uuid4()) # create a unique ID.
obj = form.save(commit=False)
obj.created_by = request.user
obj.unique_id = unique_id
obj.status = 0
obj.save()
form.save_m2m()
schedule_task.delay(obj.pk)
return render(request, 'index.html', {'form':form, 'errors':form.errors})
tasks.py
scrapyd = ScrapydAPI('http://localhost:6800')
@periodic_task(run_every=crontab(minute=1)) # how to do with task search_frequency value ?
def schedule_task(pk):
task = Task.objects.get(pk=pk)
if task.status == 0 or task.status == 1 and not datetime.date.today() >= task.scraping_end_date:
unique_id = str(uuid4()) # create a unique ID.
keywords = ''
# for keys in ast.literal_eval(obj.keywords.all()): #keywords change to csv
for keys in task.keywords.all():
if keywords:
keywords += ', ' + keys.title
else:
keywords += keys.title
settings = {
'spider_count': len(task.targets.all()),
'keywords': keywords,
'unique_id': unique_id, # unique ID for each record for DB
'USER_AGENT': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'
}
# res = ast.literal_eval(ini_list)
for site_url in task.targets.all():
domain = urlparse(site_url.address).netloc # parse the url and extract the domain
spider_name = domain.replace('.com', '')
scrapyd.schedule('default', spider_name, settings=settings, url=site_url.address, domain=domain,
keywords=keywords)
elif task.scraping_end_date == datetime.date.today():
task.status = 2
task.save() # change the task status as completed.
settings
CELERY_BROKER_URL = 'amqp://localhost'
EDIT
This answer helped me to find the solution Celery raises ValueError: not enough values to unpack.
Now this errors has gone.
Now in the celery console I am seeing this
[2020-06-08 16:33:23,123: INFO/MainProcess] Task crawler.tasks.schedule_task[0578558d-0dc6-4db7-b69f-e912b604ff3d] succeeded in 0.016000000000531145s: None
and getting no scraped results in my frontend .
Now my question is how can I check that my task is running periodically every 1 minute ?
It is the very first time I am using celery so here might be some problems.
Upvotes: 0
Views: 277
Reputation: 15738
Celery is no longer supported on Windows as platform ( version 4 dropped official support )
I highly suggest that you dockerize your app instead (or use wsl2),if you don't want to go this route You would probably need to use gevent ( notice there could be some additional problems if you go this route)
pip install gevent
celery -A <module> worker -l info -P gevent
found similar detailed answer here
Upvotes: 1