Jason
Jason

Reputation: 359

Postgres Create Table As Select query with multiple left outer joins produces duplicated data

I have several tables that have a UDID (unique ID) and some other information. I am trying to combine these tables such that every UDID appears once in the final table. The data in all of my input tables looks correct and makes good theoretical sense. However, when I look at output table I'm seeing the same data over and over.

Here is the query:

create table roi_wide_ss_gen_all as
select a.udid, a.src, a.tm8,a.tm7,a.tm6,a.tm5,a.tm4,a.tm3,a.tm2,a.tm1, a.t1, a.t2, a.t3, a.t4, a.t5,a.t6,a.t7,a.t8 
       , pf_m, female, asam, pf_50, pf_150, pf_250, pf_251 
from roi_wide_ss_gen a
    left outer join roi_wide_ss_gen_m b on (a.udid = b.udid)
    left outer join roi_wide_ss_gen_f c on (a.udid = c.udid)
    left outer join roi_wide_ss_gen_k d on (a.udid = d.udid) 
    left outer join roi_wide_ss_gen_50 e on (a.udid = e.udid)  
    left outer join roi_wide_ss_gen_150 f on (a.udid = f.udid) 
    left outer join roi_wide_ss_gen_250 g on (a.udid = g.udid)  
    left outer join roi_wide_ss_gen_251 h on (a.udid = h.udid)
    left outer join roi_wide_ss_gen_as i on (a.udid = i.udid)
;

Here are the first few rows of the output table:

             udid src tm8 tm7 tm6 tm5 tm4 tm3 tm2 tm1 t1 t2 t3 t4 t5 t6 t7 t8 pf_m female asam pf_50 pf_150 pf_250 pf_251
 2b4821ecf223b1f6   1   0   0   0   0   4   6   2   0  0  0  0  0  0  0  0  0    0      1       1     0      0      0      0
 2b4821ecf223b1f6   1   0   0   0   0   4   6   2   0  0  0  0  0  0  0  0  0    0      1       1     0      0      0      0
 2b4821ecf223b1f6   1   0   0   0   0   4   6   2   0  0  0  0  0  0  0  0  0    0      1       1     0      0      0      0
 2b4821ecf223b1f6   1   0   0   0   0   4   6   2   0  0  0  0  0  0  0  0  0    0      1       1     0      0      0      0
 2b4821ecf223b1f6   1   0   0   0   0   4   6   2   0  0  0  0  0  0  0  0  0    0      1       1     0      0      0      0

if I let this continue for 100 rows then the data does eventually change a bit, like this

2b4821ecf223b1f6   1   0   0   0   0   4   6   2   0  0  0  0  0  0  0  0  0    0      1       1     0      0      0      0         
2b4821ecf223b1f6   1   0   0   0   0   4   6   2   0  0  0  0  0  0  0  0  0    0      1       1     0      0      0      0                         
a6ce599b8344bb4c   1   0   0   0   0   1   0   0   0  0  0  0  0  0  0  0  0    1      1       0     0      0      0      0
3f1448b00f8d8031   0   0   0   0   0   0   0   0   1  0  1  1  1  0  0  0  0    1      1       0     0      0      0      0
fca0bd81bdc66de5   0   0   0   0   0   0   0   0   0  0  0  0  0  0  0  0  1    1      1       0     0      0      0      0
fca0bd81bdc66de5   0   0   0   0   0   0   0   0   0  0  0  0  0  0  0  0  1    1      1       0     0      0      0      0

but it's still not 1 row per UDID like it should be for most UDID's. I could've sworn I had this working correctly in the past, and yet...

Upvotes: 1

Views: 309

Answers (1)

roman
roman

Reputation: 117540

There're should be a duplicate udid in one of your tables - if you have no unique constraint on udid, check result of

select udid from ... group by udid having count(*) > 1

on your tables to find out

Upvotes: 1

Related Questions