alTus
alTus

Reputation: 2215

How to aggregate (min/max etc.) over Django JSONField data?

I'm using Django 1.9 with its built-in JSONField and Postgres 9.4. In my model's attrs json field I store objects with some values, including numbers. And I need to aggregate over them to find min/max values. Something like this:

Model.objects.aggregate(min=Min('attrs__my_key'))

Also, it would be useful to extract specific keys:

Model.objects.values_list('attrs__my_key', flat=True)

The above queries fail with

FieldError: "Cannot resolve keyword 'my_key' into field. Join on 'attrs' not permitted."

Is it possible somehow?

Notes:

  1. I know how to make a plain Postgres query to do the job, but am searching specifically for an ORM solution to have the ability to filter etc.
  2. I suppose this can be done with a (relatively) new query expressions/lookups API, but I haven't studied it yet.

Upvotes: 39

Views: 17156

Answers (7)

dinosaurwaltz
dinosaurwaltz

Reputation: 1771

From django 4.2 There is the new KT() expression that makes all of this a bit clearer. It's also secretly just KeyTextTransform under the hood

Model.objects.annotate(
    val=KT('json_field__key')
).aggregate(min=Min('val'))

From django 1.11 (which isn't out yet, so this might change) you can use django.contrib.postgres.fields.jsonb.KeyTextTransform instead of RawSQL .

In django 1.10 you have to copy/paste KeyTransform to you own KeyTextTransform and replace the -> operator with ->> and #> with #>> so it returns text instead of json objects.

Model.objects.annotate(
    val=KeyTextTransform('json_field_key', 'blah__json_field'))
).aggregate(min=Min('val')

You can even include KeyTextTransforms in SearchVectors for full text search

Model.objects.annotate(
    search=SearchVector(
        KeyTextTransform('jsonb_text_field_key', 'json_field'))
    )
).filter(search='stuff I am searching for')

Remember you can also index in jsonb fields, so you should consider that based upon your specific workload.

Upvotes: 48

maerteijn
maerteijn

Reputation: 650

Since Django 3.1 the KeyTextTransform function on a JSON field works for all database backends. It maps to the ->> operator in Postgres.

It can be used to annotate a specific JSON value inside a JSONField on the queryset results before you aggregate it. A more clear example how to utilize this:

First we need to annotate the key you want to aggregate. So if you have a Django model with a JSONField named data and the JSON containing looks like this:

{
    "age": 43,
    "name" "John"
}

You would annotate the queryset as following:

from django.db.models import IntegerField
from django.db.models.fields.json import KeyTextTransform

qs = Model.objects.annotate(
    age=Cast(
        KeyTextTransform("age", "data"), models.IntegerField()
    )

The Cast is needed to stay compatible with all database backend.

Now you can aggregate to your liking:

from django.db.models import Min, Max, Avg, IntegerField
from django.db.models.functions import Cast, Round

qs.aggregate(
    min_age=Round(Min("age")),
    max_age=Round(Max("age")),
    avg_age=Cast(Round(Avg("age")), IntegerField()),
)

>>> {'min_age': 25, 'max_age' 82:, 'avg_age': 33}

Upvotes: 1

Agawane
Agawane

Reputation: 173

from django.db.models.functions import Cast
from django.db.models import Max, Min

qs = Model.objects.annotate(
    val=Cast('attrs__key', FloatField())
).aggregate(
    min=Min("val"),
    max=Max("val")
)

Upvotes: 0

SamMorrowDrums
SamMorrowDrums

Reputation: 572

It is possible to do this using a Postgres function

https://www.postgresql.org/docs/9.5/functions-json.html

from django.db.models import Func, F, FloatField
from django.db.models.expressions import Value
from django.db.models.functions import Cast

text = Func(F(json_field), Value(json_key), function='jsonb_extract_path_text')
floatfield = Cast(text, FloatField())

Model.objects.aggregate(min=Min(floatfield))

This is much better than using the RawQuery because it doesn't break if you do a more complex query, where Django uses aliases and where there are field name collisions. There is so much going on with the ORM that can bite you with hand written implementations.

Upvotes: 7

Duncan
Duncan

Reputation: 161

I know this is a bit late (several months) but I came across the post while trying to do this. Managed to do it by:

1) using KeyTextTransform to convert the jsonb value to text

2) using Cast to convert it to integer, so that the SUM works:

q = myModel.objects.filter(type=9) \
.annotate(numeric_val=Cast(KeyTextTransform(sum_field, 'data'), IntegerField()))  \
.aggregate(Sum('numeric_val'))

print(q)

where 'data' is the jsonb property, and 'numeric_val' is the name of the variable I create by annotating.

Hope this helps somebody!

Upvotes: 15

matt
matt

Reputation: 1066

Seems there is no native way to do it.

I worked around like this:

my_queryset = Product.objects.all() # Or .filter()...
max_val = max(o.my_json_field.get(my_attrib, '') for o in my_queryset)

This is far from being marvelous, since it is done at the Python Level (and not at the SQL level).

Upvotes: 0

alTus
alTus

Reputation: 2215

For those who interested, I've found the solution (or workaround at least).

from django.db.models.expressions import RawSQL

Model.objects.annotate(
    val=RawSQL("((attrs->>%s)::numeric)", (json_field_key,))
).aggregate(min=Min('val')

Note that attrs->>%s expression will become smth like attrs->>'width' after processing (I mean single quotes). So if you hardcode this name you should remember to insert them or you will get error.

/// A little bit offtopic ///

And one more tricky issue not related to django itself but that is needed to be handled somehow. As attrs is json field and there're no restrictions on its keys and values you can (depending on you application logic) get some non-numeric values in, for example, width key. In this case you will get DataError from postgres as a result of executing the above query. NULL values will be ignored meanwhile so it's ok. If you can just catch the error then no problem, you're lucky. In my case I needed to ignore wrong values and the only way here is to write custom postgres function that will supress casting errors.

create or replace function safe_cast_to_numeric(text) returns numeric as $$
begin
    return cast($1 as numeric);
exception
    when invalid_text_representation then
        return null;
end;
$$ language plpgsql immutable;

And then use it to cast text to numbers:

Model.objects.annotate(
    val=RawSQL("safe_cast_to_numeric(attrs->>%s)", (json_field_key,))
).aggregate(min=Min('val')

Thus we get quite solid solution for such a dynamic thing as json.

Upvotes: 22

Related Questions