Michael
Michael

Reputation: 8788

Django with MySQL: 'Subquery returns more than 1 row'

Using django with a MySQL DB and given these models:

ModelB   ---FK--->   ModelA
    - ref_type
    - ref_id
 
ModelC

I want to get all the ModelC for each ModelA via an annotation.

I tried many options looking at existing solutions but could not make it work. The following code works when there is just one ModelC for each ModelA but as soon as there is more than one, I get the Subquery returns more than 1 row error and I don't know how to get a list of the ModelC models instead. Ideally, I'd like to build a list of JSON objects of the ModelC.

qs = ModelA.objects.all()

c_ids = (
    ModelB.objects \
        .filter(modela_id=OuterRef(OuterRef('id')), ref_type='c') \
        .values('ref_id')
)
all_c = (
    ModelC.objects \
        .filter(id__in=Subquery(c_ids)) \
        .values('id')
)

qs1 = qs.annotate(all_c=Subquery(all_c ))
for p in qs1:
    print(p, p.all_c)

Upvotes: 3

Views: 1418

Answers (3)

Simon Charette
Simon Charette

Reputation: 5116

The following should do

from django.db.models import JSONField
from django.db.models.aggregates import Aggregate

class JSONArrayAgg(Aggregate):
   function = "JSON_ARRAYAGG"
   output_field = JSONField()
   
ModelA.objects.annotate(
    all_c=Subquery(
        ModelB.objects.filter(
            ref_type="c",
            modela_id=OuterRef("id"),
        ).values(
            "modela_id"
        ).values_list(
            JSONArrayAgg("ref_id")
        )
    )
)

which translates to

SELECT
    model_a.*,
    (SELECT JSON_ARRAYAGG(model_b.ref_id)
     FROM model_b
     WHERE model_b.ref_type = "c" AND model_b.modela_id = model_a.id
     GROUP BY model_b.modela_id
    ) all_c
FROM model_a

But it would be much easier if you provided your exact model definition as it's likely only a matter of doing something along the lines of (JSONArrayAgg.filter cannot be used due to a MySQL bug.

ModelA.objects.filter(
   modelb_set__ref_type="c",
).annotate(
   all_c=JSONArrayAgg("modelb_set__ref_id")
)

which translate to

SELECT
    model_a.*,
    JSON_ARRAYAGG(model_b.ref_id)
FROM model_a
INNER JOIN model_b ON (model_b.modela_id = model_a.id)
WHERE model_b.ref_type = "c"
GROUP BY model_a.id

You could also use FilteredRelation if you want the condition to be pushed to the JOIN instead.

ModelA.objects.annotate(
   all_c_rel=FilteredRelation(
      "modelb_set", Q(modelb_set__ref_type="c")
   ),
   all_c=JSONArrayAgg("all_c_rel__ref_id")
)

Which results in

SELECT
    model_a.*,
    JSON_ARRAYAGG(model_b.ref_id)
FROM model_a
LEFT OUTER JOIN model_b ON (
    model_b.modela_id = model_a.id
    AND model_b.ref_type = "c"
)
GROUP BY model_a.id

But the LEFT OUTER JOIN might re-surface the issue you have with MySQL's handling of NULL in JSON_ARRAYAGG.

Upvotes: 2

preator
preator

Reputation: 1039

I come from the assumption that Model B is indeed a through table for M2M relationship between Model A and Model C as Işık Kaplan suggested.

In Postgres you could use ArrayAgg like Işık Kaplan suggested. Equivalent in MySQL in GROUP_CONCAT but it is not present in the ORM out of the box. Also from personal experience I wouldn't recommend it as it performed terribly in my use case.

What I ended up doing was combining 2 queries using Python which was way faster then 1 complicated query with GROUP_CONCAT (around 60K records of "Model A" and 20K of "Model B" in my case). In your case it would look like this:

a_qs = ModelA.objects.all()
c_ids_dict = defaultdict(list)
c_ids = a_qs.values("id", "models_c_objects__id")
for item in c_ids:
    if item["models_c_objects__id"]:
        c_ids_dict[item["id"]].append(item["models_c_objects__id"])
for p in a_qs:
    print(p, c_ids_dict.get(p.id, []))

Upvotes: 0

Işık Kaplan
Işık Kaplan

Reputation: 3002

ModelB looks like a junction table. Having an id pointing to A and C

Django supports junction tables.

But when it comes to annotation the objects with the list of ids, I'm not entire sure if that is possible purely by the ORM.

class ModelA(models.Model):
    model_c_objects = models.ManyToManyField("ModelC", through="ModelB") 

class ModelB(models.Model):
    model_a = models.ForeignKey(ModelA, on_delete=models.CASCADE)
    model_b = models.ForeignKey(ModelB, on_delete=models.CASCADE)

class ModelC(models.Model):
    ...


# This one here I have no idea if it would work or not
ModalA.objects.prefetch_related("models_c_objects").annotate(model_c_object_ids=ArrayAgg("model_c_objects__id")

# If it doesn't:
class ModelA(models.Model):
    model_c_objects = models.ManyToManyField("ModelC", through="ModelB") 
    
    @property
    def model_c_object_ids(self):
        return list(self.model_c_objects.values("id", flat=True))

# And you can then use it like you wished
for model_a_object in ModelA.objects.prefetch_related("models_c_objects"):
    model_a_object.model_c_object_ids # list of model_c ids like: [1,4,12,63]

I'm feeling a bit lazy but either of the two solutions should work and they both use a single query.

Upvotes: 0

Related Questions