sudo
sudo

Reputation: 5804

Optimizing a long series of SQL update queries with Psycopg2

I need to make a huge number of SQL queries that update or insert rows using Psycopg2. There are no other queries being run intermediately. Example with a table A having columns name and value:

% Basically models a list of strings and how many times they "appear"
% 'foo' is some random value each time, sometimes repeating
insert into A select ('foo', 0)
    where not exists(select 1 from A where name = 'foo' limit 1);
update A set value = value + 1 where name = 'foo';
% ... and many more just like this

This is just an example, one type of query I'm running. I'm doing other things too. I'm not looking for a solution involving reworking my SQL queries.

It's really slow, with Postgres (which is running on another server) bottlenecking it. I've tried various things to make it faster.

Is there some other way I should try involving Psycopg2?

Upvotes: 0

Views: 1467

Answers (1)

Michael Robellard
Michael Robellard

Reputation: 2358

Sounds like you have a lot of issues here. The first is that Postgres should not page fault unless you have it improperly configured or you are running other services on the machine. A properly configured Postgres instance will use your memory, but it won't pagefault.

If you need to insert or update 100,000s of things at a time, You definetly do not want to do that 1 transaction at a time, as you noted that will be very slow. In your first example what is happening is you are sending each query to the db over the network, waiting for a result, then commiting and waiting for that result, once again over the network.

Stringing together multiple things at a time will save you the 1 commit per and the back and forth network traffic which is why you saw significantly faster performance.

You can take the stringing together a step further and use copy if you are doing inserts or use a value list instead of single insert or update statements.

The real problem is a design flow in what you are doing. What you are doing from the looks of your query is implementing a counter in your database. If you are only going to count a few hundred things here or there, no big deal, but when you get into the 100,000s+ It won't work well.

This is where tools like memcached and redis come in. Both have excellent tools for very fast in memory counters. (If you only have one server you could just implement a counter in your code.) Once you have things counted just create a process to save the count to the database and clear the in memory counters.

Upvotes: 1

Related Questions