Adnan Siddiquei
Adnan Siddiquei

Reputation: 483

Is it faster and more memory efficient to manipulate data in Python or PostgreSQL?

Say I had a PostgreSQL table with 5-6 columns and a few hundred rows. Would it be more effective to use psycopg2 to load the entire table into my Python program and use Python to select the rows I want and order the rows as I desire? Or would it be more effective to use SQL to select the required rows, order them, and only load those specific rows into my Python program.

By 'effective' I mean in terms of:

  1. Memory Usage.
  2. Speed.

Additionally, how would these factors start to vary as the size of the table increases? Say, the table now has a few million rows?

Upvotes: 5

Views: 1713

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1271231

Actually, if you are comparing data that is already loaded into memory to data being retrieved from a database, then the in-memory operations are often going to be faster. Databases have overhead:

  • They are in separate processes on the same server or on a different server, so data and commands needs to move between them.
  • Queries need to be parsed and optimized.
  • Databases support multiple users, so other work may be going on using up resources.
  • Databases maintain ACID properties and data integrity, which can add additional overhead.

The first two of these in particular add overhead compared to equivalent in-memory operations for every query.

That doesn't mean that databases do not have advantages, particularly for complex queries:

  • They implement multiple different algorithms and have an optimizer to choose the best one.
  • They can take advantage of more resources -- particularly by running in parallel.
  • They can (sometimes) cache results saving lots of time.

The advantage of databases is not that they provide the best performance all the time. The advantage is that they provide good performance across a very wide range of requests with a simple interface (even if you don't like SQL, I think you need to admit that it is simpler, more concise, and more flexible that writing code in a 3rd generation language).

In addition, databases protect data, via ACID properties and other mechanisms to support data integrity.

Upvotes: 4

toastrackengima
toastrackengima

Reputation: 8762

It's almost always going to be faster to perform all of these operations in PostgreSQL. These database systems have been designed to scale well for huge amounts of data, and are highly optimised for their typical use cases. For example, they don't have to load all of the data from disk to perform most basic filters[1].

Even if this were not the case, the network latency / usage alone world be enough to balance this out, especially if you were running the query often.

Upvotes: 7

Related Questions