CuriousMind
CuriousMind

Reputation: 34145

SQL Query for sum of unique amounts, remove duplicates

Consider the following MySQL table schema:

id int,
amount decimal,
transaction_no,
location_id int,
created_at datetime

The above schema is used to store POS receipts for restaurants. For getting daily report of count of receipts & their sum. tried following query:

SELECT location_id,count(distinct(transaction_no)) as count,sum(amount) as receipt_amount FROM `receipts`  WHERE date(`receipts`.`created_at`) = '2015-05-17' GROUP BY `receipts`.`location_id`

But the issue is that a receipt with same transaction number gets repeated multiple times where each time the amount may/may not be different. business rule for handling this is that the last receipt we have received is the latest one. so the above query doesn't work.

What I am looking to do is the following:

  1. For each location, get all the receipts of that day.
  2. If the transaction no are duplicated, Get the last received receipts based on created_at
  3. So sum of amounts of amounts col.

[edit]

here is the query plan:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 25814155
     filtered: 100.00
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: r
         type: ref
possible_keys: punchh_key_location_id_created_at
          key: punchh_key_location_id_created_at
      key_len: 50
          ref: t.punchh_key
         rows: 1
     filtered: 100.00
        Extra: Using index condition; Using where
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: r
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 25814155
     filtered: 100.00
        Extra: Using temporary; Using filesort
3 rows in set, 1 warning (0.00 sec)

Upvotes: 1

Views: 3483

Answers (3)

Gordon Linoff
Gordon Linoff

Reputation: 1269743

How do you count a transaction that is repeated on multiple days?

I presume that you actually do not want to count a transaction, simply because it is the last one on the day, if there is another receipt on the next day. You can get the final record for each transaction in several ways. A typical way is to use group by (this is similar to Brian's query, but subtly different):

select r.*
from receipts r join
     (select transaction_no, max(created_at) as maxca
      from receipts r
      group by transaction_no
     ) t
     on r.transaction_no = t.transaction_no and r.created_at = t.maxca;

The full query is then:

select location_id, count(*) as numtransactions, sum(amount) as receipt_amount
from receipts r join
     (select transaction_no, max(created_at) as maxca
      from receipts r
      group by transaction_no
     ) t
     on r.transaction_no = t.transaction_no and r.created_at = t.maxca;
where r.created_at >= date('2015-05-17') and r.created_at < date('2015-05-18')
group by location_id;

Note about date comparison.

Your original form of date(r.created_at) = '2015-05-17' is logically correct. However, the use of date() means that an index cannot be used. The form with two comparisons to constants would allow the query to take advantage of an index on receipts(created_at).

The use of like for dates is to discouraged. This requires converting the date implicitly to a string and then doing the comparison as a string. This has needless conversions and in some databases makes the semantics dependent on globalization settings.

Upvotes: 1

Brian DeMilia
Brian DeMilia

Reputation: 13248

You can get sum up the amounts for just the last created_at value within the same day by joining to an inline view that determines the last created_at for each transaction_no in that day.

This avoids simply using sum(distinct ... because otherwise two different transactions with the same amount, if such exist, would only be counted once.

This approach should avoid that problem.

select      r.location_id,
            count(*) as num_transactions,
            sum(r.amount) as receipt_amount
from        receipts r
       join (
                select      transaction_no,
                            max(created_at) as last_created_at_for_trans
                from        receipts
                where       created_at like '2015-05-17%'
                group by    transaction_no
            ) v
         on r.transaction_no = v.transaction_no
        and r.created_at = v.last_created_at_for_trans
where       r.created_at like '2015-05-17%'
group by    r.location_id

Another approach is to use not exists, you might want to test to see which provides better performance:

select      r.location_id,
            count(*) as num_transactions,
            sum(r.amount) as receipt_amount
from        receipts r
where       r.created_at like '2015-05-17%'
        and not exists ( select 1
                         from   receipts x
                         where  x.transaction_no = r.transaction_no
                            and x.created_at > r.created_at
                       )
group by    r.location_id

Upvotes: 2

Mureinik
Mureinik

Reputation: 311198

You can use the distinct modified in sum too:

SELECT   location_id,
         COUNT(DISTINCT transaction_no) AS cnt,
         SUM(DISTINCT amount) AS receipt_amount 
FROM     `receipts`  
WHERE    DATE(`receipts`.`created_at`) = '2015-05-17' 
GROUP BY `receipts`.`location_id`

Upvotes: 1

Related Questions