AndyP
AndyP

Reputation: 607

INNER JOIN and using GROUP BY to prevent duplicate results?

I need to join CTE and another table (temp.holder) to get unique records but somehow I am not getting unique records and I am not able to figure out what's wrong. I see lot of them are duplicated.

Below is my query:

with data as (....)
select cola, date_part(week, trunc(CONVERT_TIMEZONE('UTC', 'PST8PDT', starttimestamp)) + 6) as colb, colc, cold, cole, colf, colg, colh, coli, count(*) as process_type
from data join
     temp.holder x
     on data.cola = x.cola
where colj NOT IN ('ABC', 'DEF', 'GHI')
AND colb = extract(week from current_date) + 1
group by data.cola, starttimestamp, colc, cold, cole, data.colf, colg, colh, coli
order by data.cola;

But when I run this query, I don't see any duplicates.

with data as (....)
selec * from data;

Is there any problem in my first query?

Also how should I go ahead and debug out what's wrong in my first query? As of now I dump output of my query in csv file and then I use my python script to figure out if there are any rows being duplicated.

I am trying to convert this:

ColA    ColB    ColC    ColD    ColE    COLF    COLG    COLH    COLI    COLJ
-----------------------------------------------------------------------------
a1      b1      c1      d1      e1      f1      g1      h1      i1      PART
a1      b1      c1      d1      e1      f1      g1      h1      i1      TYPE

into this:

ColA    ColB    ColC    ColD    ColE    COLF    COLG    COLH    COLI    process_type
-------------------------------------------------------------------------------------
a1      b1      c1      d1      e1      f1      g1      h1      i1      2

Note: I am dealing with 80k+ rows. Also starttimestamp column is timestamp without time zone column.

Upvotes: 0

Views: 197

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269463

The problem is your group by. You are grouping on the original time column rather than the calculated one in the select. So try this:

group by data.cola, colb, colc, cold, cole, data.colf, colg, colh, coli

Upvotes: 1

Related Questions