Reputation: 35
I'm very new to Postgresql and Python and I'm having some issues understanding foreign keys (which I think is what I use here). I've had a look at an example from here but I don't think that is exactly what I need
As a simple example, I have some information in an existing table:
[ID REFERENCE REF_AGE DATA1 DATA 2]
[1 JOHN 50 50 60 ]
[2 JOHN 50 55 30 ]
[3 TOM 60 60 10 ]
[4 MATT 30 76 57 ]
[5 MATT 30 45 47 ]
I want to make two new tables from this. One including the ID and data with a reference_id that links to the other new table - a reference table, where I can store the other information about each reference (e.g. the age above).
Table 1:
[ID REF_ID DATA1 DATA 2]
[1 1 50 60 ]
[2 1 55 30 ]
[3 2 60 10 ]
[4 3 76 57 ]
[5 3 45 47 ]
Table 2:
[REF_ID NAME AGE ]
[1 JOHN 50 ]
[2 TOM 60 ]
[3 MATT 30 ]
Can anyone show me how to split existing data like this? Separate the unique values from the original tables reference column into the new reference table, and insert corresponding ref_id into the other new table?
Upvotes: 0
Views: 133
Reputation: 125214
There is the recipe. But you have a problem if the person names are not unique.
drop table if exists not_normalized cascade;
create table not_normalized (
id int, reference text, ref_age int, data1 int, data2 int
);
insert into not_normalized (id, reference, ref_age, data1, data2) values
(1,'JOHN',50,50, 60 ),
(2,'JOHN',50,55, 30 ),
(3,'TOM',60,60, 10 ),
(4,'MATT',30,76, 57 ),
(5,'MATT',30,45, 47 ),
(6,null,null,42,50);
drop table if exists referenced cascade;
create table referenced (
ref_id serial primary key,
name text,
age int
);
Selecting the distinct pair (name, age) minimizes the name collision problem:
insert into referenced (name, age)
select distinct reference, ref_age
from not_normalized
where (reference, ref_age) is not null
;
table referenced;
reference | ref_age
-----------+---------
JOHN | 50
TOM | 60
MATT | 30
drop table if exists referencer;
create table referencer (
id serial primary key,
ref_id int references referenced (ref_id),
data1 int, data2 int
);
Again use the age to minimize collisions:
insert into referencer (ref_id, data1, data2)
select r.ref_id, data1, data2
from
not_normalized nn
left join
referenced r on r.name = nn.reference and r.age = nn.ref_age
;
table referencer;
id | ref_id | data1 | data2
----+--------+-------+-------
1 | 1 | 50 | 60
2 | 1 | 55 | 30
3 | 3 | 76 | 57
4 | 3 | 45 | 47
5 | 2 | 60 | 10
6 | | 42 | 50
Upvotes: 2