Delete one row out of two with same values alternating in two different columns

Question

The query goes like this:

select g1.gen_id as 'gen_1', g2.gen_id as 'gen_2', count(*) as 'count'
from gen g1, gen g2, dir d
where g1.gen_id <> g2.gen_id
[other irrelevant where conditions here]
order by g1.gen_id, g2.gen_id;

The output becomes:

# gen_1, gen_2, count
'32', '34', '5'
'34', '32', '5'
'32', '39', '2'
'32', '40', '2'
'32', '42', '1'
'32', '43', '3'
'39', '32', '2'
'43', '32', '3'
'32', '45', '4'
'32', '48', '1'
'40', '32', '2'

As you can see, this occurs because I'm getting the cartesian product of the same table (I have it in the from clause twice). If you'll notice in the output, I have values alternating in two of the columns (the first 2 columns - the third column is irrelevant here). What I want is to remove one row of each one of those duplicates. I didn't paste the entire output, but rest assured this is what happens. I have 442 lines output when they should be 221. I want to remove the "duplicate" lines. Is there a way to do this, because I can't find a way around currently.

Gordon Linoff · Accepted Answer

The solution is to use <. However, I would make a few other changes to the query:

select g1.gen_id as gen_1, g2.gen_id as gen_2, count(*) as cnt
from gen g1 join
     gen g2, dir d
     on g1.gen_id < g2.gen_id
where [other irrelevant where conditions here]
order by g1.gen_id, g2.gen_id;

First, this uses explicit join syntax. Although not strictly necessary, JOIN does a better job than , of expressing what you want to accomplish.

Second, it removes the single quotes around the column names. Only use single quotes for string and date constants. Otherwise, you're code might break, when a column alias is interpreted as a string.

Delete one row out of two with same values alternating in two different columns

Answers (2)

Related Questions