CODEWITHSUNDEEP

performancepostgresqlinsert

ArKano

Reputation: 58

What is more efficent: several insert vs single insert with union

I have a large table (~6M rows, 41 cols) in Postgresql as follows:

id | answer1 | answer2 | answer3 | ... | answer40
1  | xxx     | yyy     | null    | ... | null
2  | xxx     | null    | null    | ... | null
3  | xxx     | null    | zzz     | ... | aaa

Note that there are many empty columns in every rows and I only want those with data

I want to normalize it to get this:

id | answers
1  | xxx
1  | yyy
2  | xxx
3  | xxx
3  | zzz
...
3  | aaa

The question is, what is more efficient / fast, several inserts or a single insert and many unions?:

Option 1

create new_table as 
select id, answer1 from my_table where answer1 is not null
union 
select id, answer2 from my_table where answer2 is not null
union 
select id, answer3 from my_table where answer3 is not null
union ...

Option 2

create new_table as select id, answer1 from my_table where answer1 is not null;
insert into new_table select id, answer2 from my_table where answer2 is not null;
insert into new_table select id, answer3 from my_table where answer3 is not null;
...

Option 3: is there a better way to do this?

Upvotes: 1

Views: 70

Answers (1)

Reputation: 28233

Option 2 should be faster.

Wrap all the statements in a begin-commit block to save the time on individual commits.

For faster selects make sure that the columns being filtered (e.g. where answer1 is not null) have indexes

Upvotes: 1

Related Questions