Reputation: 466
I have a datamining problem with a very large number of attributes (>15,000). I represent the data in multiple tables with a common Id field. Having deleted empty attributes, I want to merge the resulting smaller tables so I can calculate correlations to remove redundent attributes.
I can join the tables easily with:
select * from lrg_bin0 k0
join lrg_bin1 k1 on k0.id = k1.id
join lrg_bin2 k2 on k0.id = k2.id
join lrg_bin3 k3 on k0.id = k3.id
but when I try to create a new table with:
Create table mrg0 as (
select * from lrg_bin0 k0
join lrg_bin1 k1 on k0.id = k1.id
join lrg_bin2 k2 on k0.id = k2.id
join lrg_bin3 k3 on k0.id = k3.id
)
Postgress complains with 'column "id" specified more than once'
The very large number of attributes means I am doing everything programatically so answers suggesting queries I can generate would be appreciated.
Upvotes: 2
Views: 8444
Reputation: 125424
Use using
in instead of on
create table mrg0 as
select *
from
lrg_bin0 k0
join lrg_bin1 k1 using (id)
join lrg_bin2 k2 using (id)
join lrg_bin3 k3 using (id)
Also, USING implies that only one of each pair of equivalent columns will be included in the join output, not both
Upvotes: 3
Reputation: 286
The problem is that in SQL when you perform a join using on, the matching column will be included once per table it's found in. What you can do is specify which table you want the column from ala:
Create table mrg0 as (
select k1.id, ... from lrg_bin0 k0
join lrg_bin1 k1 on k0.id = k1.id
join lrg_bin2 k2 on k0.id = k2.id
join lrg_bin3 k3 on k0.id = k3.id
)
Upvotes: 2