Reputation: 483
Say I had a PostgreSQL table with 5-6 columns and a few hundred rows. Would it be more effective to use psycopg2 to load the entire table into my Python program and use Python to select the rows I want and order the rows as I desire? Or would it be more effective to use SQL to select the required rows, order them, and only load those specific rows into my Python program.
By 'effective' I mean in terms of:
Additionally, how would these factors start to vary as the size of the table increases? Say, the table now has a few million rows?
Upvotes: 5
Views: 1713
Reputation: 1271231
Actually, if you are comparing data that is already loaded into memory to data being retrieved from a database, then the in-memory operations are often going to be faster. Databases have overhead:
The first two of these in particular add overhead compared to equivalent in-memory operations for every query.
That doesn't mean that databases do not have advantages, particularly for complex queries:
The advantage of databases is not that they provide the best performance all the time. The advantage is that they provide good performance across a very wide range of requests with a simple interface (even if you don't like SQL, I think you need to admit that it is simpler, more concise, and more flexible that writing code in a 3rd generation language).
In addition, databases protect data, via ACID properties and other mechanisms to support data integrity.
Upvotes: 4
Reputation: 8762
It's almost always going to be faster to perform all of these operations in PostgreSQL. These database systems have been designed to scale well for huge amounts of data, and are highly optimised for their typical use cases. For example, they don't have to load all of the data from disk to perform most basic filters[1].
Even if this were not the case, the network latency / usage alone world be enough to balance this out, especially if you were running the query often.
Upvotes: 7