Ahmed Elemam
Ahmed Elemam

Reputation: 416

Celery worker concurrency

I have made a scraper to scan around 150 links. Each link has around 5k sub links to get info from.

I am using Celery to run the scraper in background and store data on a Django ORM. I use BeautifulSoup for scrap URL .

When i running the celery using this command

celery worker -A ... --concurrency=50

everything working fine but the workers from 1 to 50 sleep

How i can make the celery working till the scraper finish its task?

Upvotes: 3

Views: 9084

Answers (1)

Glenn D.J.
Glenn D.J.

Reputation: 1965

First of all that command will not start 50 workers, but 1 worker with 50 processes. I'd also recommend to just use as many processes as you have cores available. (Let's say 8 for the rest of my answer.)

My guess here is that the other processes are idle because you only perform one task. If you want to do concurrent work, you'll have to split up your work in parts that can be executed concurrently. The easiest way to do this is just make a separate task for every link you want to scrape. The worker will then start working on scraping 8 links and when it finishes 1 it will start on the next one until it has finished scraping all 150.

so your calling code of your task should roughly like:

for link in links:
    scrape_link.delay(link)

with scrape_link your task function that will look something like:

@app.task
def scrape_link(link):
    #scrape the link and its sub-links

Upvotes: 4

Related Questions