alexanderkuk
alexanderkuk

Reputation: 1701

Group by fuzzy criterion

Is there any way do this task in SQL:

Having a table of luggage names and its weights that looks like this

 luggage_id | luggage_name | luggage_weight   
------------+--------------+----------------  
          1 | Tom          |              2  
          2 | Kat          |              3  
          3 | Lil          |              4  

ouput groups of things that differ in weight for less then or exactly D units. So if D equals to 1 we have to get

1. Tom, Kat
2. Kat, Lil

and if D equals to 2 we have to get

1. Tom, Kat, Lil

and if D is 0.5 we get

1. Tom
2. Kat
3. Lil

I tried to use cross join:

SELECT t1.luggage_name, t2.luggage_name
FROM Luggages t1, Luggages t2
WHERE ABS(t1.luggage_weight - t2.luggage_weight) <= 1

that resulted in


 luggage_name | luggage_name 
--------------+--------------
 Tom          | Tom
 Tom          | Kat
 Kat          | Tom
 Kat          | Kat
 Kat          | Lil
 Lil          | Kat
 Lil          | Lil

Is it possible to solve this problem with SQL?

EDIT Ok, yes, we can write

SELECT t1.luggage_name, t2.luggage_name
FROM Luggages t1, Luggages t2
WHERE ABS(t1.luggage_weight - t2.luggage_weight) <= 1
      AND t1.luggage_id < t2.luggage_id

then we will get

 luggage_name | luggage_name 
--------------+--------------
 Tom          | Kat
 Kat          | Lil

That means that Tom is in group with Kat and Kat is in group with Lil.

But, let's say we have table with all the same weights


 luggage_id | luggage_name | luggage_weight 
------------+--------------+----------------
          1 | Tom          |              1
          2 | Kat          |              1
          3 | Lil          |              1

Then we will get


 luggage_name | luggage_name 
--------------+--------------
 Tom          | Kat
 Tom          | Lil
 Kat          | Lil

That means that Tom is with Kat and Lil that is correct, but next group, Kat with Lil is already in first group. I thought about this issue and can not figure out how to solve it. How to output only different groups?

Upvotes: 0

Views: 285

Answers (2)

RichardTheKiwi
RichardTheKiwi

Reputation: 107776

SELECT t1.luggage_name, MIN(t2.luggage_name) GroupsWith
FROM Luggages t1
JOIN Luggages t2 ON ABS(t1.luggage_weight - t2.luggage_weight) <= 1
AND ((t1.luggage_weight < t2.luggage_weight)
  OR (t1.luggage_weight = t2.luggage_weight AND t1.luggage_name < t2.luggage_name))
GROUP BY t1.luggage_name
ORDER BY t1.luggage_name

This will show a chain of grouped users without any duplicates, so if you have

Tom-2, Kat-3, Lil-4, Bob-5

They are all D=1 from each other. If your query has D=1, it will show

Kat Lil
Lil Bob
Tom Kat

Which shows the entire group Tom->Kat->Lil->Bob.

Upvotes: 1

gbn
gbn

Reputation: 432471

You need a condition to restrict the full Cartesian to a triangular join

SELECT t1.luggage_name, t2.luggage_name
FROM Luggages t1, Luggages t2
WHERE ABS(t1.luggage_weight - t2.luggage_weight) <= 1

AND
t1.luggageid < t2.luggageid

This removes unwanted pairs

Edit

You output would then be

 luggage_name | luggage_name 
--------------+--------------
 Tom          | Kat
 Kat          | Lil

When d=0.5 then you have no pairs though because there is no difference less than 0.5. A list of singletons is actually wrong when compared to the output from d <= 1

Edit 2: Proof for those you don't get it...

DECLARE @luggage TABLE (luggageid int IDENTITY(1,1), luggage_name varchar(30), luggage_weight float)
INSERT @luggage (luggage_name, luggage_weight) VALUES ('Tom', 2), ('Kat', 3), ('Lil', 4)

SELECT t1.luggage_name, t2.luggage_name
FROM @luggage t1, @luggage t2
WHERE ABS(t1.luggage_weight - t2.luggage_weight) <= 1
AND
t1.luggageid < t2.luggageid


SELECT t1.luggage_name, t2.luggage_name
FROM @luggage t1, @luggage t2
WHERE ABS(t1.luggage_weight - t2.luggage_weight) <= 2
AND
t1.luggageid < t2.luggageid


SELECT t1.luggage_name, t2.luggage_name
FROM @luggage t1, @luggage t2
WHERE ABS(t1.luggage_weight - t2.luggage_weight) <= 0.5
AND
t1.luggageid < t2.luggageid

Upvotes: 3

Related Questions