dave k
dave k

Reputation: 1389

SQL returns less results when using UNION?

I have a SQL Server stored procedure that doesn't give me all the results when I add in the union and the second half. The first half will give me all 6 results, using the union, I only get 5.

There are 3 patients with their own [chart number], each have 2 items that should be displayed. if the [CPTCode] is the same for both entries of a patient, only one of the two entries show up when i add the union (it gives me all 6 with the same [cptcode] without the union). The second half isn't actually pulling any information with what i'm doing right now, but it is needed.

select      /*Patients with chart numbers*/
   B1.[chart Number],
   '0' as newPatient, 
   isnull(Lytec.[Last Name],'') as [Last Name], 
   isnull(Lytec.[First Name],'') as [First Name],
   isnull(Address.[Name],
   Lytec.[Facility Code]) as [Facility], 
   isnull(B1.DOS,'') as [DOS], 
   isnull(Ins.[Name],
   Lytec.[Primary Code]) as [Primary Code],
   isnull(B1.[CPTCode],'') as cptCode, 
   isnull(B1.[Units],'') as [Units], 
   isnull(B1.[Modifiers],'') as [Modifiers], 
   isnull(B1.[cptCodeOther],'') as [cptCodeOther],
   isnull(B1.[cptCode2],'') as cptCode2, 
   isnull(B1.[Units],'') as [Units2], 
   isnull(B1.[Modifiers2],'') as [Modifiers2], 
   isnull(B1.[cptCodeOther2],'') as [cptCodeOther2],
   'name' as ProviderName
from 
   [sandboxmr].dbo.patient Lytec
   left join [dbo].[Billing] B1 on B1.[Chart Number] = Lytec.[Chart Number]
   left join [sandboxmr].dbo.[Address] Address on Lytec.[Facility Code] = Address.[Code]
   left join [sandboxmr].dbo.[Insurance] Ins on Lytec.[Primary Code] = Ins.[Code]
where 
   b1.[userid] = 54
   and (b1.[DateSubmitted] >= convert(varchar(25),'2011-8-31',101) 
   and b1.[DateSubmitted] <= convert(varchar(25),'2011-9-22',101))

union

select      /*Patients without chart numbers*/
   cast(P.id as varchar(15)) as [chart number], 
   '1' as newPatient,  
   isnull(P.[Last Name],'') as [Last Name], 
   isnull(P.[First Name],'') as [First Name],
   isnull(Address.[Name],
   P.[Facility Code]) as [Facility], 
   isnull(IV.DOS,isnull(SV.DOS,'')) as [DOS], 
   isnull(Ins.[Name],P.[Primary_Code]) as [Primary Code],
   isnull(IV.[cptCode],isnull(SV.[cptCode],'')) as cptCode, 
   isnull(IV.[Units],isnull(SV.[Units],'')) as [Units], 
   isnull(IV.[Modifiers],isnull(SV.[Modifiers],'')) as [Modifiers],    
   isnull(IV.[cptcodeother],isnull(SV.[cptcodeother],'')) as [cptCodeOther],
   isnull(IV.[cptCode2],isnull(SV.[cptCode2],'')) as cptCode2, 
   isnull(IV.Units2,isnull(SV.Units2,'')) as [Units2], 
   isnull(IV.[Modifiers2],isnull(SV.[Modifiers2],'')) as [Modifiers2], 
   isnull(IV.[cptCodeOther2],isnull(SV.[cptCodeOther2],'')) as [cptCodeOther2],
   'Name' as ProviderName
from 
   [DNSList].[dbo].[Patient] P 
   left join [dbo].[InitialVisits] IV on p.emr_id = IV.patientid
   left join [dbo].[SubsequentVisits] SV on p.emr_id = SV.patientid
   left join [sandboxmr].dbo.[Address] Address on P.[Facility Code] = Address.[Code]
left join [sandboxmr].dbo.[Insurance] Ins on P.[Primary_Code] = Ins.[Code]
where 
   p.[userid] = 54 
   and p.[Chart Number] is null
   and (p.[DateSubmitted] >= convert(varchar(25),'2011-8-31',101) 
   and p.[DateSubmitted] <= convert(varchar(25),'2011-9-22',101))
order by 
   [Last Name]

Why does it do this, and how can I fix it? I've tried adding a distinct to the [cptcode] area, but it of course generates an error.

Thanks for any help you can provide!

Upvotes: 42

Views: 33352

Answers (3)

Manngo
Manngo

Reputation: 16281

A little background here. Technically a table is a set of rows. This is in the mathematical sense of sets.

Two important properties of mathematical sets are:

  • A set is unordered
  • A set has no duplicates

The first point is why row order is not reliable unless you add an ORDER BY clause. The second is one reason you should always have a primary key to ensure that each rows is indeed distinct.

The UNION operation is a set operation and (a) combines two sets, (b) produces a set. In order to maintain the integrity of a set the UNION will remove duplicate rows.

Unfortunately, there is no real way of guaranteeing whether two rows are supposed to be same, so SQL will simply compare the values in the SELECT clauses. If those values are the same, then SQL decides there is a duplicate.

This can lead to an exaggerated example:

SELECT state FROM customers
UNION
SELECT state FROM customers;

In principle, the state value is produced twice, duplicate values are removed, and what you have is a long-winded way of saying SELECT DISTINCT.

The set behaviour is also why you can’t have an ORDER BY clause in the SELECT statements: once you order a data set, it is no longer a true set.

(Yes, you can have an ORDER BY clause, but it isn’t attached to a SELECT statement, but rather to the resulting UNION).

Of course, SQL isn’t just about pure mathematics, so it includes two features which break the set:

  • You can order a set with an ORDER BY clause.

    However, you can only do this after the rest of the work is complete, which is why it’s the last clause.

  • You can include duplicates using the UNION ALL clause.

Upvotes: 3

Josh Darnell
Josh Darnell

Reputation: 11433

I believe you need to use UNION ALL. UNION suppresses duplicates by default

Upvotes: 34

Joe Stefanelli
Joe Stefanelli

Reputation: 135799

UNION will eliminate duplicate rows, whether they come between the two result sets or, as in your case, within just one result set. Try using UNION ALL instead.

Upvotes: 107

Related Questions