Kassym Dorsel
Kassym Dorsel

Reputation: 4843

I don't understand this SQL behavior for select

Given two tables :

A with row : [1641, 1468] Amongst others B with row : [1641, 1468] Amongst others

If I run this :

select distinct A.c1, B.c2
from A, B
where A.c1 <> B.c1 and A.c2 <> B.c2

I was expecting not to get that tuple back. Although when running the command it returns that tuple.

Is there an explanation for this behavior ? Using SQLite.

Upvotes: 0

Views: 219

Answers (4)

Can't replicate that.

$ sqlite3 t.db
SQLite version 3.7.4
Enter ".help" for instructions
Enter SQL statements terminated with a ";"

sqlite> create table a (c1 integer, c2 integer);
sqlite> create table b (c1 integer, c2 integer);
sqlite> insert into a values (1641, 1468);
sqlite> insert into b values (1641, 1468);
sqlite> select * from a;
1641|1468
sqlite> select * from b;
1641|1468
sqlite> select distinct A.c1, B.c2
   ...> from A, B
   ...> where A.c1 <> B.c1 and A.c2 <> B.c2;
sqlite>

After comments . . .

sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE a (
c1 integer,
c2 integer);
INSERT INTO "a" VALUES(1641,1468);
INSERT INTO "a" VALUES(1,2);
INSERT INTO "a" VALUES(3,4);
CREATE TABLE b (
c1 integer, c2 integer
);
INSERT INTO "b" VALUES(1641,1468);
INSERT INTO "b" VALUES(1,2);
INSERT INTO "b" VALUES(5,6);
COMMIT;
sqlite> select distinct A.c1, B.c2
   ...> from A, B
   ...> where A.c1 <> B.c1 and A.c2 <> B.c2;
1|6
1|1468
3|2
3|6
3|1468
1641|2
1641|6

Still don't get the tuple {1641, 1468}.

If you just want the rows that aren't duplicated you might try something more along these lines.

sqlite> select a.c1, a.c2 from a
   ...> left join b on (b.c1 = a.c1 and b.c2 = a.c2)
   ...> where b.c1 is null and b.c2 is null
   ...> union all
   ...> select b.c1, b.c2 from b
   ...> left join a on (b.c1 = a.c1 and b.c2 = a.c2)
   ...> where a.c1 is null and a.c2 is null;
3|4
1641|73
5|6

Upvotes: 2

ilovefigs
ilovefigs

Reputation: 776

For one thing, your join is producing a Cartesian product.

create table A(c1 int, c2 int);
create table B(c1 int, c2 int);
insert into A(c1, c2) values(1641, 1468), (1, 5);
insert into B(c1, c2) values(1641, 1468), (1, 5);
select * from A, B;

Which will result in:

+------+------+------+------+
| A.c1 | A.c2 | B.c1 | B.c2 |
+------+------+------+------+
| 1641 | 1468 | 1641 | 1468 |
|    1 |    5 | 1641 | 1468 |
| 1641 | 1468 |    1 |    5 |
|    1 |    5 |    1 |    5 |
+------+------+------+------+

So, when running your query, the where clause will restrict the first row and the last row. Which will leave us with:

+------+------+------+------+
| A.c1 | A.c2 | B.c1 | B.c2 |
+------+------+------+------+
|    1 |    5 | 1641 | 1468 |
| 1641 | 1468 |    1 |    5 |
+------+------+------+------+

And running your full query, will result in:

+------+------+
| c1   | c2   |
+------+------+
|    1 | 1468 |
| 1641 |    5 |
+------+------+

But, we can construct some rows so that we will get it regardless since you're using A.c1 and B.c2:

insert into A(c1, c2) values(1641, 5);
insert into B(c1, c2) values(1, 1468);

Which will result in a row with [1641, 1468].

Upvotes: 2

Mustafa Shabib
Mustafa Shabib

Reputation: 740

Out of curiosity, I ran the following test on ms sql server 05:

create table a  
(  
c1 int,  
c2 int  
)  
create table b  
(  
c1 int,  
c2 int  
)   

insert into a values(1641,1468)
insert into a values(1641,1111)
insert into b values(1641,1468)
insert into b values(1112,1345)

select distinct A.c1, B.c2
   from A, B
where A.c1 <> B.c1 and A.c2 <> B.c2;

my results were:

c1   |   c2
-------------
1641 |   1345

As you can see, I did not get that original tuple back which is what Kassym was expecting.

I guess sqllite is different, but I would have expected it to work the way Kassym expected as well.

-m

Upvotes: 0

Albin Sunnanbo
Albin Sunnanbo

Reputation: 47068

If you have

A
---
C1 C1
-----
1  2
1  3

.

B
---
C1 C1
-----
1  2
1  3

The join will first produce

A.C1 A.C2 B.C1 B.C2
-------------------
1    2    1    3
1    3    1    2

With distinct and just the selected columns it will be

A.C1 B.C2
---------
1    3
1    2

I suspect you want something like

select distinct A.c1, B.c2
from A, B
WHERE NOT EXISTS (SELECT 1 FROM B WHERE A.C1 = B.C1 AND A.C2 = B.C2)

Upvotes: 0

Related Questions