Reputation: 607
I need to join CTE and another table (temp.holder
) to get unique records but somehow I am not getting unique records and I am not able to figure out what's wrong. I see lot of them are duplicated.
Below is my query:
with data as (....)
select cola, date_part(week, trunc(CONVERT_TIMEZONE('UTC', 'PST8PDT', starttimestamp)) + 6) as colb, colc, cold, cole, colf, colg, colh, coli, count(*) as process_type
from data join
temp.holder x
on data.cola = x.cola
where colj NOT IN ('ABC', 'DEF', 'GHI')
AND colb = extract(week from current_date) + 1
group by data.cola, starttimestamp, colc, cold, cole, data.colf, colg, colh, coli
order by data.cola;
But when I run this query, I don't see any duplicates.
with data as (....)
selec * from data;
Is there any problem in my first query?
Also how should I go ahead and debug out what's wrong in my first query? As of now I dump output of my query in csv file and then I use my python script to figure out if there are any rows being duplicated.
I am trying to convert this:
ColA ColB ColC ColD ColE COLF COLG COLH COLI COLJ
-----------------------------------------------------------------------------
a1 b1 c1 d1 e1 f1 g1 h1 i1 PART
a1 b1 c1 d1 e1 f1 g1 h1 i1 TYPE
into this:
ColA ColB ColC ColD ColE COLF COLG COLH COLI process_type
-------------------------------------------------------------------------------------
a1 b1 c1 d1 e1 f1 g1 h1 i1 2
Note: I am dealing with 80k+ rows. Also starttimestamp
column is timestamp without time zone column.
Upvotes: 0
Views: 197
Reputation: 1269463
The problem is your group by
. You are grouping on the original time column rather than the calculated one in the select
. So try this:
group by data.cola, colb, colc, cold, cole, data.colf, colg, colh, coli
Upvotes: 1