How to annotate/aggregate each item in a list without for loops (Django)

Question

I have a list, s, that is saved and filtered from the modelfield "sentence" in class Label,a modelfield "label" containing each item in list s, that is one post per item for each item in sentence s. I want to aggregate or annote the items in "label" that are in list s in field sentence with the max occurence of a third field "labelnames". For instance, list s in field sentence; ["a", "green", "car"]. For each element in s in the classes s occurs; "a" for instance,count the max occurence for "a" in all posts with sentence and fields in "labelnames". I wonder if there's a better way to aggregate this instead of looping elements in list s to then annotate or aggregate them with "labelname" and "label" ?

For each element in S, "a", "green","car", only if one of the elements are in a post with S (they're saved in class Label () one pos"a" in field label and list s in field sentence, second post "green" in field label and list s in sentence and so on ), aggregate or annotate the elements with elements in field labelname, for instance "a" with labelname A if labelname A or B depending on max count of all elements "a" with labelname fieldvalue "A" has a higher maximumvalue than all label "a" with labelname fieldvalue "B" saved in the db.

#I've retrived id for sentence s by for label "a"
str_ = "a"
t = Label.objects.filter(label__startswith=str_).filter('label')
# get sentence that t is associated with
s =  OneLabelingPCS.objects.get(pk=int(t.id)).sentence  
            #print

# This gives me pk=int(t.id) for one post that "a" and sentence occurs in. I'd #like all posts "a","green", "a car" with the sentence s and maximum labelname. # in models.py

class Label(models.Model):


 sentence = models.CharField(max_length=200) # <-- contains list s
  label = models.CharField(max_length=200) # <-- contains each item in s, one item per post 
labelname = models.CharField(max_length=200)

little_birdie · Accepted Answer

As far as Django is concerned, 'sentence' is a string.. it is stored in the database as a string too.. so neither Django nor the database have any understanding of the elements of the list you are putting in there.

The traditional way of doing this would be to have a second table that contains all the words, eg:

class Label(models.Model):
    label = models.CharField(max_length=200) 
    labelname = models.CharField(max_length=200)

class LabelWord(models.Model):
    word = models.CharField(max_length=30)
    position = models.IntegerField()
    label = models.ForeignKey(Label)

So for each Label that you insert, also insert the LabelWord records, eg:

label = Label(label="fooo", labelname="FOO Name")
label.save()

position = 0
for word in ('a', 'green', 'car'):
    LabelWord(label=label, word=word, position=position).save()
    position += 1

Ok now you want to find all the labels with the word 'car'? Django doesn't make it obvious how to do this.. but here's the easy (but not super efficient) way:

labels = Label.objects.filter(
    pk__in=LabelWords.objects.filter(word='car').values_list('label_id', flat=True)
)

That will work fine for relatively small amounts of data. Search around for "django filter on reverse foreign key".. you will find it is a common problem with django that people are trying to solve more efficiently.

Another thing I will add is that you could do this with a ManyToMany relationship thus storing each unique word only once. More efficient in some ways, less efficient in others...

How to annotate/aggregate each item in a list without for loops (Django)

Answers (1)

Related Questions