Karthik S
Karthik S

Reputation: 11548

Why this multiple SQL Server join query fails

I have a table in this format:

PID   TIV_2011  TIV_2012     LAT         LON
123     1000      1200     20.123489   80.341245
456     1500      3000     21.341287   80.341245
789     2000      1500     21.341287   80.341245
321     1000       750     21.123641   80.238716
567     1500      2300     22.123641   80.238716

Question: Write a query to print the sum of all total investment values in 2012 (TIV_2012), to a scale of 2 decimal places, for all policyholders who meet the following criteria:

  1. Have the same TIV_2011 value as one or more other policyholders.

  2. Are not located in the same city as another policyholder (i.e.: the latitude, longitude) attribute pair must be unique).

My Query:

select cast(sum(i1.tiv_2012) as decimal(12,2))
from insurance i1 join insurance i2 on i1.tiv_2011 = i2.tiv_2011
where i1.pid != i2.pid 
and i1.lat != i2.lat 
and i1.lon != i2.lon

Answer is 7250 which is 1200 + 3000 + 750 + 2300.

But the actual answer should be 4250 which is 1200 + 750 + 2300

This question has been answered before and the query that works fine is:

select cast(sum(i1.tiv_2012) as decimal(12,2)) from insurance i1 
join (select TIV_2011 from insurance group by TIV_2011 having count(*) > 1) i2 on i1.tiv_2011 = i2.TIV_2011
join (select lat,LON from insurance group by LAT,LON having COUNT(*) = 1) i3 on i1.LAT = i3.LAT and i1.LON = i3.LON

But can someone please let me know where or why my joins are going wrong?

Upvotes: 0

Views: 208

Answers (4)

Damien_The_Unbeliever
Damien_The_Unbeliever

Reputation: 239824

Prefer to write more straightforward SQL that reflects the requirements. If you want to do this via joins it can quickly get out of hand because the joins will cross multiply with each other and inflate the number of rows. In addition, the two numbered clauses require different conditions to express, with the second only doable with a LEFT JOIN that you're hoping will fail.

I'd prefer to use two EXISTS checks:

declare @t table (PID int, TIV_2011 int, TIV_2012 int, Lat decimal(8,6),Lon decimal(9,6))
insert into @t(PID,TIV_2011,TIV_2012,LAT,LON) values
(123,1000,1200,20.123489,80.341245),
(456,1500,3000,21.341287,80.341245),
(789,2000,1500,21.341287,80.341245),
(321,1000, 750,21.123641,80.238716),
(567,1500,2300,22.123641,80.238716)

select
    CONVERT(decimal(12,2),SUM(TIV_2012))
from
    @t t_keep
where
    exists (select * from @t t_other2011
       where t_keep.PID != t_other2011.PID
       and t_keep.TIV_2011 = t_other2011.TIV_2011
) and
    not exists (select * from @t t_city
       where t_keep.PID != t_city.PID and
       t_keep.Lat = t_city.Lat and
       t_keep.Lon = t_city.Lon)

Select rows from @t (t_keep) where (first EXISTS) there is another row in the table for another policyholder where we share the same TIV_2011 value. But also (and) where (second EXISTS) there isn't a row in the table for another policyholder where we share the same Lat and Lon values.

This produces the 4250 value.

Upvotes: 2

Thorsten Kettner
Thorsten Kettner

Reputation: 95101

This is no task where I would join at all.

select sum(tiv_2012)
from insurance i
where exists
(
  select * 
  from mytable other 
  where other.pid <> i.pid
  and other.tiv_2011 = i.tiv_2011
)
and not exists
(
  select * 
  from mytable other 
  where other.pid <> i.pid
  and other.lat = i.lat
  and other.lon = i.lon
);

I know this doesn't answer your question what is exactly wrong with your joins. In my opinion it's the approach to even use joins here, when you only want to look up other rows in the table. For looking up records we have EXISTS and IN and criteria goes to the WHERE clause as it should.

Upvotes: 1

Denis Rubashkin
Denis Rubashkin

Reputation: 2191

You have two independent conditions but you try to use it together. Split them out using, for instance, a subquery and you will get expected result.

DECLARE @Test TABLE (
    PID         int,
    TIV_2011    int,
    TIV_2012    int,
    LAT         DECIMAL(8,6),        
    LON         DECIMAL(8,6)
) 
INSERT @Test(PID,   TIV_2011,  TIV_2012,     LAT,         LON)
VALUES
(123, 1000, 1200, 20.123489, 80.341245),
(456, 1500, 3000, 21.341287, 80.341245),
(789, 2000, 1500, 21.341287, 80.341245),
(321, 1000,  750, 21.123641, 80.238716),
(567, 1500, 2300, 22.123641, 80.238716)

select cast(sum(i1.tiv_2012) as decimal(12,2))
from @Test i1 
    join @Test i2 on i1.tiv_2011 = i2.tiv_2011
        AND i1.pid != i2.pid 
where 
    NOT EXISTS (
        SELECT 1
        FROM @Test
        WHERE LAT = i1.lat 
            and LON = i1.lon
            AND PID != i1.pid
    )

Upvotes: 1

Omkar
Omkar

Reputation: 392

your this line is different and doesn't make sense.

i1.pid != i2.pid 

That doesn't need to be there.

Upvotes: 0

Related Questions