fastest way of inserting data into a table

Question

I have a Postgres database, and I have inserted some data into the table. Because of issues with the internet connection, some of the data couldn't be written.The file that I am trying to write into the database is large (about 330712484 rows - even the ws -l command takes a while to complete.

Now, the column row_id is the (integer) primary key, and is already indexed. Since some of the rows could not be inserted into the table, I wanted to insert these specific rows into the table. (I estimate only about 1.8% of the data isn't inserted into the table ...) As a beginning, I tried to see of the primary keys were inside the database like so:

conn      = psycopg2.connect(connector)
cur       = conn.cursor()

with open(fileName) as f:

    header = f.readline().strip()
    header = list(csv.reader([header]))[0]
    print(header)
    for i, l in enumerate(f):
        if i>10: break
        print(l.strip())

        row_id = l.split(',')[0]

        query = 'select * from raw_data.chartevents where row_id={}'.format(row_id)
        cur.execute(query)
        print(cur.fetchall())

cur.close()
conn.close()

Even for the first few rows of data, checking to see whether the primary key exists takes a really large amount of time.

What would be the fastest way of doing this?

Dimitri Fontaine · Accepted Answer

The fastest way to insert data in PostgreSQL is using the COPY protocol, which is implemented in psycopg2. COPY will not allow you to check if target id already exists, tho. Best option is to COPY your file content's into a temporary table then INSERT or UPDATE from this, as in the Batch Update article I wrote on my http://tapoueh.org blog a while ago.

With a recent enough version of PostgreSQL you may use

INSERT INTO ...
SELECT * FROM copy_target_table
    ON CONFICT (pkey_name) DO NOTHING

fastest way of inserting data into a table

Answers (2)

Related Questions