Reputation: 1474
I am trying to update 100's of objects in my Job which is scheduled every two hours.
I have articles table in my Model. All articles are parsed and then different attributes are saved for each article.
First i query to get all unparsed articles and then parse each URL which is saved against article and save the received attributes. Below is my code
articles = Articles.objects.filter(status = 0) #100's of articles
for art in articles:
try:
url = art.link
result = ArticleParser(URL) #Custom function which will do all the parsing
art.author = result.articleauthor
art.description = result.articlecontent[:5000]
art.imageurl = result.articleImage
art.status = 1
art.save()
except Exception as e:
art.author = ""
art.description = ""
art.imageurl = ""
art.status = 2
art.save()
The thing is when this job is running CPU utilization is very high also DB process utilization is very high. I am trying to pin point when and where it spikes.
Question: Is this the right way to update multiple objects or is there any better way to do it? Any suggestions. Appreciate your help. Regards
Edit 1: Sorry for the confusion. There is some explanation to do. The fields like author, desc etc they will be different for every article they will be returned after i parse the URL. The reason i am updating in loop is because these fields will be different for every iteration according to the URL. I have updated the code i hope it helps clearing the confusion.
Upvotes: 0
Views: 75
Reputation: 1082
1.Better not to use 'Exception', need to specify concretely: KeyError, IndexError etc.
2.Data can be created once. Something like this:
data = dict(
author=articleauthor,
description=articlecontent[:5000],
imageurl=articleImage,
status=1
)
Articles.objects.filter(status=0).update(**data)
To Edit 1: Probably want to set up a periodic tasks celery. That is, for each query to a separate task. For help see this documentation.
Upvotes: 1
Reputation: 29967
You are doing 100s of DB operations in a relatively tight loop, so it is expected that there is some load on the DB.
From my understanding, you do NOT want to set the fields author, description and imageurl to same value on all articles, so QuerySet.update won't work for you.
Upvotes: 2
Reputation: 2044
Django recommends this way when you want to update or delete multi-objects: https://docs.djangoproject.com/en/1.6/topics/db/optimization/#use-queryset-update-and-delete
Upvotes: 1