SQLAlchemy - filtering func.count within query

Question

Let's say that I have a table with a column, that has some integer values and I want to calculate the percentage of values that are over 200 for that column.

Here's the kicker, I would prefer if I could do it inside one query that I could use group_by on.

results = db.session.query(
        ClassA.some_variable,
        label('entries', func.count(ClassA.some_variable)),
        label('percent', *no clue*)
  ).filter(ClassA.value.isnot(None)).group_by(ClassA.some_variable)

Alternately it would be okay thought not prefered to do the percentage calculation on the client side, something like this.

results = db.session.query(
        ClassA.some_variable,
        label('entries', func.count(ClassA.some_variable)),
        label('total_count', func.count(ClassA.value)),
        label('over_200_count', func.count(ClassA.value > 200)),
  ).filter(ClassA.value.isnot(None)).group_by(ClassA.some_variable)

But I obviously can't filter within the count statemenet, and I can't apply the filter at the end of the query, since if I apply the > 200 constraint at the end, total_count wouldn't work.

Using RAW SQL is an option too, it doesn't have to be Sqlalchemy

Ilja Everil&#228; · Accepted Answer

MariaDB unfortunately does not support the aggregate FILTER clause, but you can work around that using a CASE expression or NULLIF, since COUNT returns the count of non-null values of given expression:

from sqlalchemy import case

...

func.count(case([(ClassA.value > 200, 1)])).label('over_200_count')

With that in mind you can calculate the percentage simply as

(func.count(case([(ClassA.value > 200, 1)])) * 1.0 /
 func.count(ClassA.value)).label('percent')

though there's that one edge: what if func.count(ClassA.value) is 0? Depending on whether you'd consider 0 or NULL a valid return value you could either use yet another CASE expression or NULLIF:

dividend = func.count(case([(ClassA.value > 200, 1)])) * 1.0
divisor = func.count(ClassA.value)

# Zero
case([(divisor == 0, 0)],
     else_=dividend / divisor).label('percent')

# NULL
(dividend / func.nullif(divisor, 0)).label('percent')

Finally, you could create a compilation extension for mysql dialect that rewrites a FILTER clause to a suitable CASE expression:

from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import FunctionFilter
from sqlalchemy.sql.functions import Function
from sqlalchemy import case


@compiles(FunctionFilter, 'mysql')
def compile_functionfilter_mysql(element, compiler, **kwgs):
    # Support unary functions only
    arg0, = element.func.clauses

    new_func = Function(
        element.func.name,
        case([(element.criterion, arg0)]),
        packagenames=element.func.packagenames,
        type_=element.func.type,
        bind=element.func._bind)

    return new_func._compiler_dispatch(compiler, **kwgs)

With that in place you could express the dividend as

dividend = func.count(1).filter(ClassA.value > 200) * 1.0

which compiles to

In [28]: print(dividend.compile(dialect=mysql.dialect()))
count(CASE WHEN (class_a.value > %s) THEN %s END) * %s

SQLAlchemy - filtering func.count within query

Answers (1)

Related Questions