How to handle inefficiency of Django's Queryset?

Question

I've seen other questions, but unfortunately i couldn't find similar problems yet.

The size of information is approximately equal to 1 GB, but the amount of objects that i'm iterating through is very big. (I'm unable to find it out though, shell is automatically killed in minutes after i execute len(model.objects.all()).

Considering that the process is killed just by trying to get its length (by utilizing len function, I also tried count() method, but it seems to be limited to certain extent), I knew searching through the objects would be negligible (especially searching through them by utilizing similarity algorithms).

But i still tried it, I've used Cosine similarity to find out the best match, this is the code for search:

zsim = ("", 0)
for qst in Question.objects.all().iterator():
    sim = cosine_similarity(qst.question, string)
    if zenith_sim[1] > 0.75:
        break
    elif sim > zenith_sim[1]:
        zenith_sim = (qst.answer, sim)
return str(zenith_sim[0])

The code above searches for the string that is most similar to the string of the user, although to avoid insignificant iteration, if the similarity is above 75%, it breaks the loop. I've also used iterator() method hoping that it would save some memory.

As expected, the process was killed few minutes later after its execution. I'm not sure how can this be improved. The machine itself is not slow, although it can not be classified as supercomputer.

Systems of large organizations can execute similarity query through the information of 100+ petabytes in seconds.

I wonder what could be used to increase efficiency of similarity query, searching through this data causes Django to kill its own process. What could be the solution for efficient query? Is direct database search much more efficient?

How to handle inefficiency of Django's Queryset?

Answers (1)

Related Questions

How to handle inefficiency of Django&#39;s Queryset?

Answers (1)

Related Questions

How to handle inefficiency of Django's Queryset?