Reputation: 25
I have the following indexes
create index i_payment_amount ON payment(amount);
create index i_customer_createdate ON customer(createdate);
and the following query
select c.createdate, c.firstname, c.lastname, round(sum(p.amount)) as spentmoney
from customer c
join rental r
on c.customerid = r.customerid
join payment p
on p.rentalid = r.rentalid
where c.createdate > date '2019-06-01' + 30
or (select round(sum(pp.amount)) from payment pp
join rental rr
on rr.rentalid = pp.rentalid
where rr.rentalid = r.rentalid) < 50
group by c.firstname, c.lastname,c.createdate
order by c.firstname, c.lastname;
The query is calculating the customers who registered in one month before 2019-06-01 and also getting customers who did not spend more than $50. And I wanted to optimize it with the help of indexes.
I created b-tree indexes and With the first try, I wanted to even make the indexes appear in the query plan, but they didn't.
I also couldn't create a function-based index for the payment, because it does not support the group function (sum). Are there any suggestions to create a proper index, which will optimize the query or even use them?
Upvotes: 0
Views: 141
Reputation: 15991
I don't see any obvious candidates for indexing in the query.
Say there are 2 million rows in customer
, and 1 million of them have createdate > date '2019-06-01' + 30
(which can be simplified to createdate > date '2019-07-01'
). Using an index to find those 1 million rows and then visiting the customer
table a million times is going to be a lot more I/O than just full-scanning the table once. Range-partitioning customer
might help, if you are licensed for it and depending on the data distribution.
Possibly an index on payment(rentalid, amount)
could be treated by the optimiser as a skinny table which would be more efficient to full-scan than the payment
table itself, since those are the only two columns you need from the table, making the join to payment
more efficient. However, that is only one of three tables involved in the query so I wouldn't expect a massive improvement.
I notice in your question you mention that you want the customers who registered in one month before 2019-06-01, but I don't see that condition in your query. If registration date is c.createdate
then perhaps
where c.createdate > date '2019-06-01' + 30
should be something more like
where c.createdate between date '2019-06-01' and date '2019-07-01'
in which case an index on c.createdate
starts to look more useful, but that's a different query.
Upvotes: 1
Reputation: 142720
Would including the HAVING
clause do any good? Because, you have all those values, already so you'd avoid the subquery entirely. Something like this:
SELECT c.createdate,
c.firstname,
c.lastname,
ROUND (SUM (p.amount)) AS spentmoney
FROM customer c
JOIN rental r ON c.customerid = r.customerid
JOIN payment p ON p.rentalid = r.rentalid
GROUP BY c.firstname, c.lastname, c.createdate
HAVING SUM (p.amount) < 50
OR c.createdate > DATE '2019-06-01' + 30
ORDER BY c.firstname, c.lastname;
Upvotes: 0