Reputation: 271
For the business that I am working in I would like to get information on our customers. The base information I have on these customers is as follows:
Activation_Date
stored in a Loans
table, datatype is datetime
.ActivityDate
stored in a CustomerDailyLoanActivity_Information
table (a daily loans table to those interested, it is part of a datamart and stores for each day that a customer has been active with our company how much they have paid into their loan, so if a customer has an Activation_Date
of 15-03-2017, it has ActivityDate
s in the CustomerDailyLoanActivity_Information
table from 15-03-2017 up until now whereby each ActivityDate
has a record in another column Sum_Paid_To_Date
how much has been paid up until that ActivityDate
). Datatype of ActivityDate
is date
.What I would like to know is the following, I would like to know how much each customer has paid on 1, or 2, or 3, etc. months after his Activation_Date
. So the query would look something like the following (slightly pseudo-code, the more important part is the WHERE
clause).
SELECT
cldai.Sum_Paid_To_Date,
cldai.ActivityDate,
cldai.Customer_Account_Number
FROM
CustomerLoanDailyActivity_Information cldai
INNER JOIN
Loans l ON l.Customer_Account_Number = cldai.Customer_Account_Number
WHERE
(cldai.ActivityDate = CAST(l.Activation_Date AS date)
OR
cldai.ActivityDate = DATEADD(month, 1, CAST(l.Activation_Date AS date))
OR
cldai.ActivityDate = DATEADD(month, 2, CAST(l.Activation_Date AS date))
OR
cldai.ActivityDate = DATEADD(month, 3, CAST(l.Activation_Date AS date))
)
ORDER BY
l.Customer_Account_Number, cldai.ActivityDate ASC
So the problem is that this query is really really slow (because of the WHERE clause and because the cldai
table is big (~6 GB)) and exits before any data is retrieved. A couple of problems that I have heard, and possible solutions, but haven't worked so far.
ActivityDate
column, which is indexed. I used CONVERT before but that was also really slow. I feel like I need to do the convert/cast though, because the ActivityDate
is of date
type and the Activation_Date
is of datetime
type, so there is a possibility that the time part of the datetime
in Activation_Date
will cause there to be no matches with the ActivityDate
(e.g. Activation_Date
for a given customer is 15-03-2017 09:00:00 so it will never match with ActivityDate
15-03-2017 because this might be converted to datetime
15-03-2017 00:00:00, which will never be equal because of the time
part).Any ideas on how to make this query perform more quickly? Any help would be greatly appreciated.
Upvotes: 1
Views: 456
Reputation: 271
So a massive speedup was obtained by using a LEFT JOIN
instead of an INNER JOIN
and by not ordering the data on the server but on the client side. This reduced the query time from about an hour and 10 minutes to about 1 minute. It seems unbelievable but it's what happened.
Regards,
Tim.
Upvotes: 1
Reputation: 1092
How about splitting this up into two steps? Step one - build a table with the four dates for each customer. Then step two, join this to your main CustomerLoanDailyActivity_Information table on date and customer account number. The second step would have a much simpler join, just an = between the ActivityDate and date entry in the table you have built.
Upvotes: 0
Reputation: 1092
If you are guaranteed to have a record for each day, you could apply use the row_number() function to apply row numbers for each group of customer loan repayment records, and then retrieve rows 1,31,61 and 91? This would avoid any date manipulation.
Upvotes: 0