Reputation: 267
I have 2 query which both aims to select all batchNo that follows 3 conditions:
Query 1: Outputs 940 rows
SELECT
DISTINCT bh.BatchNo,
bh.Coverage,
DateUploaded = MIN(csa.DateUpdated)
FROM
Registration2..BatchHeader bh with(nolock)
INNER JOIN ClaimsProcess..BatchHeader bhc with(nolock) on bhc.BatchNo = bh.BatchNo
INNER JOIN ClaimsInfo ci with(nolock) on ci.BatchNo = bhc.BatchNo
INNER JOIN Claims c with(nolock) on c.ClaimNo = ci.ClaimNo
INNER JOIN ClaimStatusAudit csa WITH(NOLOCK) on csa.CLAIMNO = ci.ClaimNo
WHERE c.ClaimStatus in('95','90') AND bhc.CreatedBy = 'PROVIDERLINK'
GROUP BY bh.BatchNo, bh.Coverage
HAVING MIN(CSA.DateUpdated) >= convert(varchar(10),GETDATE() -1,110) + ' 15:00:00.000'
Query 2: Outputs 1314 rows
SELECT
DISTINCT bh.BatchNo,
bh.Coverage
FROM Registration2..BatchHeader bh with(nolock)
INNER JOIN ClaimsProcess..BatchHeader bhc with(nolock) on bhc.BatchNo = bh.BatchNo
INNER JOIN ClaimsInfo ci with(nolock) on ci.BatchNo = bhc.BatchNo
INNER JOIN Claims c with(nolock) on c.ClaimNo = ci.ClaimNo
WHERE c.ClaimStatus in('95','90') AND bhc.CreatedBy = 'PROVIDERLINK'
AND (SELECT MIN(DATEUPDATED) FROM CLAIMSTATUSAUDIT WITH(NOLOCK)WHERE CLAIMNO = ci.ClaimNo) >= convert(varchar(10),GETDATE() -1,110) + ' 15:00:00.000'
Though both got the same logic.. they output different number of rows... I would like to know which among the two is more accurate...
BTW.. Both outputs follow the 3 given conditions..
Upvotes: 1
Views: 95
Reputation: 2220
Your assumption is wrong. These two queries are not employing the same logic, simply because of the order in which each clause is evaluated. Clauses are evaluated in the following order (see here for the full article):
With that detail out of the way, let's analyze why these two queries return a different number of rows.
The reason you're returning a different number of rows is because of when you are filtering for a date prior to after 3pm today.
In Query 1, you're selecting all Batch Numbers and Coverages that meet two conditions: 1. have corresponding records in all joined tables 2. have the desired claim status and were created by "ProviderLink"
You get this list of records once the From, Where, and Group by clauses have been executed.
You are then running the aggregate calculation (Min) on that set of data, pulling the minimum DateUpdated, yet you have not yet put any restriction on how the DateUpdated should be limited. So when you then group your data and filter the groups using the Having clause, you're filtering out all records that meet criteria from numbers 1 and 2 above and also had a DateUpdated prior to 3pm today. Let's look at an example.
Record 1 has a BatchNo 123 and Coverage A and was last updated on 4/4/2014 12:00:00.000
Record 2 has a BatchNo 123 and Coverage A and was last updated today at 5/1/2014 3:01:00.000
Assuming Records 1 & 2 have corresponding records in all joined tables, query 1 will pull back the distinct BatchNo and Coverage (123 & A, respectively) and find the minimum DateUpdated which is 4/4/2014 12:00:00.000. Then, once grouped, your Having clause will say the DateUpdated is not greater than today at 3pm, so it will filter the grouped records out.
Query 2, on the other hand, is taking a different approach. It will see Records 1 and 2 as the same in terms of BatchNo & Coverage because those values are identical. However, in the where clause (i.e., the initial filtering process), it's only looking for records where the minimum DateUpdated is greater than today at 3pm, so it's finding Record 2, and returning it in the dataset.
I think you will find that is the case with the 374 missing records from Dataset 1.
All that said, and with the understanding that we cannot tell you which dataset is better, you'll find that Query 1 will only show groups of distinct BatchNos & Coverages where the minimum DateUpdated among any of the records falling into that group was last updated after 3pm today. This means Query 1 is returning only BatchNos and Coverages which contain very new records.
Query 2 is returning any distinct BatchNo & Coverage groupings where any record within its group was last updated after 3pm today. So which one is right for you?
Upvotes: 1