Reputation: 416
I have made a scraper to scan around 150 links. Each link has around 5k sub links to get info from.
I am using Celery to run the scraper in background and store data on a Django ORM. I use BeautifulSoup
for scrap URL .
When i running the celery using this command
celery worker -A ... --concurrency=50
everything working fine but the workers from 1 to 50 sleep
How i can make the celery working till the scraper finish its task?
Upvotes: 3
Views: 9084
Reputation: 1965
First of all that command will not start 50 workers, but 1 worker with 50 processes. I'd also recommend to just use as many processes as you have cores available. (Let's say 8 for the rest of my answer.)
My guess here is that the other processes are idle because you only perform one task. If you want to do concurrent work, you'll have to split up your work in parts that can be executed concurrently. The easiest way to do this is just make a separate task for every link you want to scrape. The worker will then start working on scraping 8 links and when it finishes 1 it will start on the next one until it has finished scraping all 150.
so your calling code of your task should roughly like:
for link in links:
scrape_link.delay(link)
with scrape_link your task function that will look something like:
@app.task
def scrape_link(link):
#scrape the link and its sub-links
Upvotes: 4