Reputation: 3318
Say I have data on a table as such:
id | other_id | category | amount
--------------------------------
1 | abc | widget | 100
2 | abc | widget | 200
3 | def | widget | 100
4 | ghi | gadget | 100
5 | ghi | gadget | 100
6 | jkl | gadget | 100
7 | jkl | gadget | 100
I want to query this table to return
(other_id, category, sum_of_amount)
where sum_of_amount
is the sum of the amount
column over the all rows in the same other_id
. In addition to this, I want to exclude the tuples where the combination of category
and sum_of_amount
are NOT unique. So the query should return tuples:
(abc, widget, 300)
(def, widget, 100)
and not any of the gadget
rows because the combination (gadget, 200)
is not unique.
So far, I have this:
with session_scope() as db_session:
query = db_session.query(
ModelClass.other_id,
ModelClass.category,
label('sum_of_amount', func.sum(ModelClass.amount))
).group_by(
ModelClass.other_id,
ModelClass.category
)
This query is not filtering anything out. I think I need to use distinct somehow but I can't figure out.
Upvotes: 1
Views: 3018
Reputation: 1157
You can generate the result set of (ohter_id, category, sum_of_amount)
with the query as you noted::
=# SELECT other_id, category, SUM(amount) AS sum_of_amount
FROM test
GROUP BY other_id, category;
other_id │ category │ sum_of_amount
──────────┼──────────┼───────────────
abc │ widget │ 300
ghi │ gadget │ 200
jkl │ gadget │ 200
def │ widget │ 100
(4 rows)
And then, you have to exclude rows whose (category, sum_of_amount)
are not unique. In above result set to determine every rows' uniqueness, you can add new column containing the number of rows having same (category, sum_of_amount)
like below:
=# SELECT other_id, category, SUM(amount) AS sum_of_amount,
COUNT(*) OVER (PARTITION BY category, SUM(amount)) AS duplicates
FROM test
GROUP BY other_id, category;
other_id │ category │ sum_of_amount │ duplicates
──────────┼──────────┼───────────────┼───────
ghi │ gadget │ 200 │ 2
jkl │ gadget │ 200 │ 2
def │ widget │ 100 │ 1
abc │ widget │ 300 │ 1
(4 rows)
As you can see in above demonstration, you have determinants in the hands. Now you can generate the result set that you want to find by adding WHERE
clause using the duplicates
column. Since window functions(the OVER
part of duplicates
column) are not allowed in WHERE clauses and we have to evaluate the result of window function after sums of amounts have calculated, we have to use subquery.
=# SELECT other_id, category, sum_of_amount
FROM (
SELECT other_id, category, SUM(amount) AS sum_of_amount,
COUNT(*) OVER (PARTITION BY category, SUM(amount)) AS duplicates
FROM test
GROUP BY other_id, category
) d
WHERE duplicates = 1
ORDER BY other_id, category;
other_id │ category │ sum_of_amount
──────────┼──────────┼───────────────
abc │ widget │ 300
def │ widget │ 100
(2 rows)
SQLAlchemy query expression of above SQL could be:
from sqlalchemy import func, over
sum_of_amount = label('sum_of_amount', func.sum(ModelClass.amount))
duplicates = over(func.count('*'),
partition_by=(ModelClass.category, sum_of_amount))
query = db_session.query(
ModelClass.other_id,
ModelClass.category,
sum_of_amount,
duplicates,
).group_by(
ModelClass.other_id,
ModelClass.category
).from_self(
ModelClass.other_id,
ModelClass.category,
sum_of_amount
).filter(duplicates == 1)
Upvotes: 2