Is it faster and more memory efficient to manipulate data in Python or PostgreSQL?

Question

Say I had a PostgreSQL table with 5-6 columns and a few hundred rows. Would it be more effective to use psycopg2 to load the entire table into my Python program and use Python to select the rows I want and order the rows as I desire? Or would it be more effective to use SQL to select the required rows, order them, and only load those specific rows into my Python program.

By 'effective' I mean in terms of:

Memory Usage.
Speed.

Additionally, how would these factors start to vary as the size of the table increases? Say, the table now has a few million rows?

Gordon Linoff · Accepted Answer

Actually, if you are comparing data that is already loaded into memory to data being retrieved from a database, then the in-memory operations are often going to be faster. Databases have overhead:

They are in separate processes on the same server or on a different server, so data and commands needs to move between them.
Queries need to be parsed and optimized.
Databases support multiple users, so other work may be going on using up resources.
Databases maintain ACID properties and data integrity, which can add additional overhead.

The first two of these in particular add overhead compared to equivalent in-memory operations for every query.

That doesn't mean that databases do not have advantages, particularly for complex queries:

They implement multiple different algorithms and have an optimizer to choose the best one.
They can take advantage of more resources -- particularly by running in parallel.
They can (sometimes) cache results saving lots of time.

The advantage of databases is not that they provide the best performance all the time. The advantage is that they provide good performance across a very wide range of requests with a simple interface (even if you don't like SQL, I think you need to admit that it is simpler, more concise, and more flexible that writing code in a 3rd generation language).

In addition, databases protect data, via ACID properties and other mechanisms to support data integrity.

Is it faster and more memory efficient to manipulate data in Python or PostgreSQL?

Answers (2)

Related Questions