Reputation: 6305
Here is my SQL
query:
SELECT SUM(amz_event_shipment_items.quantity),
amz_event_shipment_items.seller_sku
FROM amz_event_shipment_items
INNER JOIN amz_event_fees ON amz_event_shipment_items.id = amz_event_fees.shipment_item_id
INNER JOIN amz_shipment_events ON amz_shipment_events.id = amz_event_shipment_items.shipment_event_id
WHERE amz_event_fees.currency = 'USD'
AND amz_shipment_events.event_type <> 'RefundEvent'
AND amz_shipment_events.posted_date BETWEEN '2016-5-1 07:00:00' AND '2016-5-7 06:59:59'
GROUP BY amz_event_shipment_items.seller_sku
But the returned values are too high... Can't make sense to me...
Am I missing anything?
Edit
Many shipment_events for each date
Each shipment_event HAS MANY shipment_item / BELONGS TO ONE event
Each shipment_item HAS MANY shipment_fee / BELONGS TO ONE item
Upvotes: 0
Views: 67
Reputation: 94884
This is less an answer but an attachment. If I understand correctly, your query returned the wrong results but was reasonably fast, whereas mine (with the EXISTS
clauses) returns the correct results, but is extremely slow.
So it seems the task of eliminating the duplicates takes too much time.
Two ideas here:
First idea: Eliminate duplicates immediately
Instead of joning to fees, we aggregate fees before joining:
select
sum(i.quantity),
i.seller_sku
from amz_event_shipment_items i
join -- join with only one record per ID to substitute an EXISTS clause
(
select distinct shipment_item_id
from amz_event_fees
where f.currency = 'USD'
) f on f.shipment_item_id = i.id
and exists
(
select *
from amz_shipment_events e
where e.event_type <> 'RefundEvent'
and e.posted_date between '2016-05-01 07:00:00' and '2016-05-07 06:59:59'
and e.id = i.shipment_event_id
)
group by i.seller_sku;
Second idea: Pre-aggregate values
Here we try to aggregate as soon as we can, so as to keep the intermediate result small and not have to look up the events table for every single items record.
select
sum(i.pre_sum_quantity),
i.seller_sku
from
(
select seller_sku, shipment_event_id, sum(quantity) as pre_sum_quantity
from amz_event_shipment_items
where exists
(
select *
from amz_event_fees f
where f.currency = 'USD'
and f.shipment_item_id = amz_event_shipment_items.id
)
group by seller_sku, shipment_event_id
) i
where exists
(
select *
from amz_shipment_events e
where e.event_type <> 'RefundEvent'
and e.posted_date between '2016-05-01 07:00:00' and '2016-05-07 06:59:59'
and e.id = i.shipment_event_id
)
group by i.seller_sku;
In case there are only few event types, you could also try to get rid of the <>
, thus making it a bit more likely an index is used:
where e.event_type in ('EarlyPaymentEvent','LatePaymentEvent')
(In that case it might pay to have an index where event_type
comes before the posted_date
.)
I must admit, I don't think any of this will be much faster than the original EXISTS
query, but it may be worth a try.
Upvotes: 1
Reputation: 94884
You are multiplying the quantities by the number of fees. Use an IN
or EXISTS
clause when looking for mere existence.
select
sum(i.quantity),
i.seller_sku
from amz_event_shipment_items i
where exists
(
select *
from amz_event_fees f
where f.currency = 'USD'
and f.shipment_item_id = i.id
)
and exists
(
select *
from amz_shipment_events e
where e.event_type <> 'RefundEvent'
and e.posted_date between '2016-05-01 07:00:00' and '2016-05-07 06:59:59'
and e.id = i.shipment_event_id
)
group by i.seller_sku;
(MySQL is known to be slow on IN
clauses sometimes, so I am using EXISTS
here, although I like IN
better.)
Upvotes: 1
Reputation: 1792
Possible that one of your joins is returning more records than you expect. I would try just doing a select *
and sort by sku and eyeball the results.
Upvotes: 0