user1050619
user1050619

Reputation: 20856

PostgreSQL performance issues on INSERT

I have created a table in PostgreSQL and here is the definition -

CREATE TABLE "Scratch"
( id uuid NOT NULL,
  text_1 text,
  text_2 text,
  text_3 text,
  text_4 text,
  ts time with time zone,
  CONSTRAINT pk PRIMARY KEY (id)
);

Now, I used a Python program to insert 1 million rows - 2000 byte text values in text_* columns. Here is my script -

import string
import random
import psycopg2
conn = psycopg2.connect(database="Test",user="postgres",password="postgres",host="localhost",port="5432")
print "connection success"

cur = conn.cursor()
import time
start =  time.time()
for each in range(1000000):
    text_1 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(2000))
    text_2 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(2000))
    text_3 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(2000))
    text_4 = ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(2000))

    query = """INSERT INTO "Scratch" (id,text_1,text_2,text_3,text_4,ts) \
          VALUES (uuid_generate_v4(),'{0}','{1}','{2}','{3}',current_timestamp)""".format(text_1, text_2,text_3,text_4);

    cur.execute(query)

conn.commit()

end = time.time()

print end - start
print "Load complete"

The insert takes -

end - start = 23000 seconds = 384 minutes

I can increase the performance of the inserts using bulk inserts or increase the number of commits, but what I’m really worried about is the number of minutes it takes for doing a select on 1 million rows.

It’s been 20 minutes now and still I did not see the result for this simple query -

SELECT id, text_1, text_2, text_3, text_4, ts
  FROM "Scratch";

I'm sure it does a full table scan.

But how can I increase the performance of this table? I'm planning to add a index on "ts" field. But how will I force the query to use this new index on this simple query?

What would be the right approach?

Upvotes: 0

Views: 85

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269753

Of course your query is doing a full table scan. It is returning all the columns from all the rows in the table. The issue is probably not PostgreSQL, but consuming the data being returned -- that is a lot of data.

Perhaps a simple query like this would help you understand performance:

select count(*)
from "Scratch"

Or even something like:

SELECT id, text_1, text_2, text_3, text_4, ts   
FROM "Scratch"
LIMIT 10;

Upvotes: 2

Related Questions