ShellRox
ShellRox

Reputation: 2602

How to handle inefficiency of Django's Queryset?

I've seen other questions, but unfortunately i couldn't find similar problems yet.

The size of information is approximately equal to 1 GB, but the amount of objects that i'm iterating through is very big. (I'm unable to find it out though, shell is automatically killed in minutes after i execute len(model.objects.all()).

Considering that the process is killed just by trying to get its length (by utilizing len function, I also tried count() method, but it seems to be limited to certain extent), I knew searching through the objects would be negligible (especially searching through them by utilizing similarity algorithms).

But i still tried it, I've used Cosine similarity to find out the best match, this is the code for search:

zsim = ("", 0)
for qst in Question.objects.all().iterator():
    sim = cosine_similarity(qst.question, string)
    if zenith_sim[1] > 0.75:
        break
    elif sim > zenith_sim[1]:
        zenith_sim = (qst.answer, sim)
return str(zenith_sim[0])

The code above searches for the string that is most similar to the string of the user, although to avoid insignificant iteration, if the similarity is above 75%, it breaks the loop. I've also used iterator() method hoping that it would save some memory.

As expected, the process was killed few minutes later after its execution. I'm not sure how can this be improved. The machine itself is not slow, although it can not be classified as supercomputer.


Systems of large organizations can execute similarity query through the information of 100+ petabytes in seconds.

I wonder what could be used to increase efficiency of similarity query, searching through this data causes Django to kill its own process. What could be the solution for efficient query? Is direct database search much more efficient?

Upvotes: 1

Views: 220

Answers (1)

Rambarun Komaljeet
Rambarun Komaljeet

Reputation: 646

(Pardon me if i don't quite understand correctly what you are trying to do) Did you try this to count data returned ?

model.objects.all().count()

Also, why not use model.objects.filter() to limit amount of data retreived ?

Maybe you should be using the django debug toolbar to debug the bottlenecks in those queries.

Upvotes: 1

Related Questions