planet260
planet260

Reputation: 1474

Django Model: Best Approach to update?

I am trying to update 100's of objects in my Job which is scheduled every two hours.

I have articles table in my Model. All articles are parsed and then different attributes are saved for each article.

First i query to get all unparsed articles and then parse each URL which is saved against article and save the received attributes. Below is my code

articles = Articles.objects.filter(status = 0) #100's of articles
for art in articles:
    try:
        url = art.link
        result = ArticleParser(URL) #Custom function which will do all the parsing
        art.author = result.articleauthor
        art.description = result.articlecontent[:5000]
        art.imageurl = result.articleImage
        art.status = 1
        art.save()
    except Exception as e:
        art.author = "" 
        art.description = ""
        art.imageurl = ""
        art.status = 2
        art.save()

The thing is when this job is running CPU utilization is very high also DB process utilization is very high. I am trying to pin point when and where it spikes.

Question: Is this the right way to update multiple objects or is there any better way to do it? Any suggestions. Appreciate your help. Regards

Edit 1: Sorry for the confusion. There is some explanation to do. The fields like author, desc etc they will be different for every article they will be returned after i parse the URL. The reason i am updating in loop is because these fields will be different for every iteration according to the URL. I have updated the code i hope it helps clearing the confusion.

Upvotes: 0

Views: 75

Answers (3)

crazyzubr
crazyzubr

Reputation: 1082

1.Better not to use 'Exception', need to specify concretely: KeyError, IndexError etc.

2.Data can be created once. Something like this:

data = dict(
    author=articleauthor,
    description=articlecontent[:5000],
    imageurl=articleImage,
    status=1
)

Articles.objects.filter(status=0).update(**data)

To Edit 1: Probably want to set up a periodic tasks celery. That is, for each query to a separate task. For help see this documentation.

Upvotes: 1

Daniel Hepper
Daniel Hepper

Reputation: 29967

You are doing 100s of DB operations in a relatively tight loop, so it is expected that there is some load on the DB.

  1. If you have a lot of articles, make sure you have an index on the status column to avoid a table scan.
  2. You can try disabling autocommit and wrapping the whole update in one transaction instead.

From my understanding, you do NOT want to set the fields author, description and imageurl to same value on all articles, so QuerySet.update won't work for you.

Upvotes: 2

anhtran
anhtran

Reputation: 2044

Django recommends this way when you want to update or delete multi-objects: https://docs.djangoproject.com/en/1.6/topics/db/optimization/#use-queryset-update-and-delete

Upvotes: 1

Related Questions