Reputation: 58
I have a large table (~6M rows, 41 cols) in Postgresql as follows:
id | answer1 | answer2 | answer3 | ... | answer40
1 | xxx | yyy | null | ... | null
2 | xxx | null | null | ... | null
3 | xxx | null | zzz | ... | aaa
Note that there are many empty columns in every rows and I only want those with data
I want to normalize it to get this:
id | answers
1 | xxx
1 | yyy
2 | xxx
3 | xxx
3 | zzz
...
3 | aaa
The question is, what is more efficient / fast, several inserts or a single insert and many unions?:
Option 1
create new_table as
select id, answer1 from my_table where answer1 is not null
union
select id, answer2 from my_table where answer2 is not null
union
select id, answer3 from my_table where answer3 is not null
union ...
Option 2
create new_table as select id, answer1 from my_table where answer1 is not null;
insert into new_table select id, answer2 from my_table where answer2 is not null;
insert into new_table select id, answer3 from my_table where answer3 is not null;
...
Option 3: is there a better way to do this?
Upvotes: 1
Views: 70
Reputation: 28233
Option 2 should be faster.
Wrap all the statements in a begin-commit
block to save the time on individual commits.
For faster selects make sure that the columns being filtered (e.g. where answer1 is not null
) have indexes
Upvotes: 1