Reputation: 34145
Consider the following MySQL table schema:
id int,
amount decimal,
transaction_no,
location_id int,
created_at datetime
The above schema is used to store POS receipts for restaurants. For getting daily report of count of receipts & their sum. tried following query:
SELECT location_id,count(distinct(transaction_no)) as count,sum(amount) as receipt_amount FROM `receipts` WHERE date(`receipts`.`created_at`) = '2015-05-17' GROUP BY `receipts`.`location_id`
But the issue is that a receipt with same transaction number gets repeated multiple times where each time the amount may/may not be different. business rule for handling this is that the last receipt we have received is the latest one. so the above query doesn't work.
What I am looking to do is the following:
[edit]
here is the query plan:
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 25814155
filtered: 100.00
Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: r
type: ref
possible_keys: punchh_key_location_id_created_at
key: punchh_key_location_id_created_at
key_len: 50
ref: t.punchh_key
rows: 1
filtered: 100.00
Extra: Using index condition; Using where
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: r
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 25814155
filtered: 100.00
Extra: Using temporary; Using filesort
3 rows in set, 1 warning (0.00 sec)
Upvotes: 1
Views: 3483
Reputation: 1269743
How do you count a transaction that is repeated on multiple days?
I presume that you actually do not want to count a transaction, simply because it is the last one on the day, if there is another receipt on the next day. You can get the final record for each transaction in several ways. A typical way is to use group by
(this is similar to Brian's query, but subtly different):
select r.*
from receipts r join
(select transaction_no, max(created_at) as maxca
from receipts r
group by transaction_no
) t
on r.transaction_no = t.transaction_no and r.created_at = t.maxca;
The full query is then:
select location_id, count(*) as numtransactions, sum(amount) as receipt_amount
from receipts r join
(select transaction_no, max(created_at) as maxca
from receipts r
group by transaction_no
) t
on r.transaction_no = t.transaction_no and r.created_at = t.maxca;
where r.created_at >= date('2015-05-17') and r.created_at < date('2015-05-18')
group by location_id;
Note about date comparison.
Your original form of date(r.created_at) = '2015-05-17'
is logically correct. However, the use of date()
means that an index cannot be used. The form with two comparisons to constants would allow the query to take advantage of an index on receipts(created_at)
.
The use of like
for dates is to discouraged. This requires converting the date implicitly to a string and then doing the comparison as a string. This has needless conversions and in some databases makes the semantics dependent on globalization settings.
Upvotes: 1
Reputation: 13248
You can get sum up the amounts for just the last created_at
value within the same day by joining to an inline view that determines the last created_at
for each transaction_no
in that day.
This avoids simply using sum(distinct ...
because otherwise two different transactions with the same amount, if such exist, would only be counted once.
This approach should avoid that problem.
select r.location_id,
count(*) as num_transactions,
sum(r.amount) as receipt_amount
from receipts r
join (
select transaction_no,
max(created_at) as last_created_at_for_trans
from receipts
where created_at like '2015-05-17%'
group by transaction_no
) v
on r.transaction_no = v.transaction_no
and r.created_at = v.last_created_at_for_trans
where r.created_at like '2015-05-17%'
group by r.location_id
Another approach is to use not exists
, you might want to test to see which provides better performance:
select r.location_id,
count(*) as num_transactions,
sum(r.amount) as receipt_amount
from receipts r
where r.created_at like '2015-05-17%'
and not exists ( select 1
from receipts x
where x.transaction_no = r.transaction_no
and x.created_at > r.created_at
)
group by r.location_id
Upvotes: 2
Reputation: 311198
You can use the distinct
modified in sum
too:
SELECT location_id,
COUNT(DISTINCT transaction_no) AS cnt,
SUM(DISTINCT amount) AS receipt_amount
FROM `receipts`
WHERE DATE(`receipts`.`created_at`) = '2015-05-17'
GROUP BY `receipts`.`location_id`
Upvotes: 1