hero121
hero121

Reputation: 121

Insert several thousand rows into table with values from other tables

I have a PostgreSQL database with 4 tables - Table A, B, C and D. Table A has three columns which are IDs from the other three tables.

Something like this:

TABLE A
--------------------------
| id | B_id | C_id | D_id |
---------------------------

I already have several thousand rows of data in tables B, C, D, and now want to generate an insert script for table A, by selecting data from the other 3 based on some conditions.

I tried as follows:

INSERT INTO A (B_id, C_id, D_id) VALUES ((SELECT id FROM B WHERE CONDITION), (SELECT id FROM C WHERE CONDITION), (SELECT id FROM D WHERE CONDITION)),((SELECT id FROM B WHERE CONDITION), (SELECT id FROM C WHERE CONDITION), (SELECT id FROM D WHERE CONDITION)),((SELECT id FROM B WHERE CONDITION), (SELECT id FROM C WHERE CONDITION), (SELECT id FROM D WHERE CONDITION)), ....;

However, with a big amount of rows this takes ages and then fails.

I am wondering if I'm doing this right and if not, what would be the most efficient way of achieving what I want.

More detailed info and example data

Accounts
---------------------------------------------
| id | firstName | lastName | email (unique)|
---------------------------------------------
  1     Account      One      [email protected]

Groups
----------------------
| id | name (unique) |
----------------------
  1      Group One
Titles
----------------------
| id | name (unique) |
----------------------
  1      Title One

Table A then contains an id from each of these as mentioned:

A
------------------------------------
| id | accountId | groupId | titleId
------------------------------------
  1        1          1          1

For each account (each unique email), I programmatically generate an insert statement

INSERT INTO A (accountId, groupId, titleId)
VALUES (
    (SELECT id FROM accounts WHERE email = {EMAIL}),
    (SELECT id FROM groups WHERE name = {groupName}),
    (SELECT id FROM titles WHERE name = {titleName}))

As mentioned there will be several thousand (aprox 15000) insert statements generated. The condition values are programmatically added from some parsed data in my script.

The select statements will always return exactly one row because where conditions are done on columns with unique constraints. Therefore each account email should have exactly one combination of account + group + title in table A.

Upvotes: 2

Views: 744

Answers (2)

Abelisto
Abelisto

Reputation: 15614

Create single temporary table for all {EMAIL}, {groupName}, {titleName} triplets and fill it. Lets say it would be:

create temp table filler (
    email text,
    group_name text,
    title_name text);

insert into filler values
    ('email1', 'group1', 'title1'),
    ('email2', 'group2', 'title2'),
    ....
    ('emailn', 'groupn', 'titlen');

Or use copy.

Then insert values in single statement using it:

insert into A (accountId, groupId, titleId)
select a.id, g.id, t.id
from
    filler as f
        left join accounts as a on (f.email = a.email)
        left join groups as g on (f.group_name = g.name)
        left join titles as t on (f.title_name = t.name);

PS: Indexes on accounts.email, groups.name and titles.name could be useful.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269463

Your values expression is set up to insert exactly one row. Each subquery is a scalar subquery that can return at most one row. The overall query will fail, if any of them return more than one row.

Perhaps you want a Cartesian Product of the values returned by all subqueries. If so, use insert . . . select:

INSERT INTO A (B_id, C_id, D_id) 
    SELECT b.id, c.id, d.id
    FROM (SELECT id FROM B WHERE CONDITION) b CROSS JOIN
         (SELECT id FROM C WHERE CONDITION) c CROSS JOIN
         (SELECT id FROM D WHERE CONDITION) d;

Run the SELECT first to be sure it generates the rows you intend to insert.

Upvotes: 0

Related Questions