Reputation: 2659
I hope the title is not misleading.
Anyway, I have two models, both have m2m relationships with a third model.
class Model1: keywords = m2m(Keyword)
class Model2: keywords = m2m(Keyword)
Given the keywords for a Model2 instance like this:
keywords2 = model2_instance.keywords.all()
I need to retrieve the Model1 instances which have at least a keyword that is in keywords2, something like:
Model1.objects.filter(keywords__in=keywords2)
and sort them by the number of keywords that match (dont think its possible via 'in' field lookup). Question is, how do i do this?
I'm thinking of just manually interating through each of Model1 instances, appending them to a dictionary of results for every match, but I need this to scale, for say tens of thousands of records. Here is how I imagined it would be like:
result = {}
keywords2_ids = model2.keywords.all().values_list('id',flat=True)
for model1 in Model1.objects.all():
keywords_matched = model1.keywords.filter(id__in=keywords2_ids).count()
objs = result.get(str(keywords_matched), [])
result[str(keywords_matched)] = objs.append(obj)
There must be an faster way to do this. Any ideas?
Upvotes: 0
Views: 242
Reputation: 1118
You can just switch to raw SQL. What you have to do is to write a custom manager for Model1
to return the sorted set of ids of Model1 objects based on the keyword match counts. The SQL is simple as joining the two many to many tables(Django automatically creates a table to represent a many to many relationship) on keyword ids and then grouping on Model1 ids for COUNT
sql function. Then using an ORDER BY
clause on those counts will produce the sorted Model1
id list you need. In MySQL,
SELECT appname_model1_keywords.model1_id, count(*) as match_count FROM appname_model1_keywords
JOIN appname_model2_keywords
ON (appname_model1_keywords.keyword_id = appname_model2_keywords.keyword_id)
WHERE appname_model2_keywords.model2_id = model2_object_id
GROUP BY appname_model1_keywords.model1_id
ORDER BY match_count
Here model2_object_id
is the model2_instance
id. This will definitely be faster and more scalable.
Upvotes: 2