Skippy le Grand Gourou
Skippy le Grand Gourou

Reputation: 7704

Language with fast & flexible (PostgreSQL) DB access?

Hopefully this question is not too much open-ended… In a few words : I'm looking for a script or programming language with fast but easy access to a database (PostgreSQL).

I want to use results from queries on some tables in a PostgreSQL database as input for some R analysis. Queries are simple SELECT requests (there might be room for improvement in the requests, but for now I'm not looking this way — I already did a while ago), but within a loop on results of a first query. Tables include both numbers and strings, and are thousands if not hundreds of thousands rows long, so the total number of queries can be quite large.

Obviously I first wrote an R script using RPostgreSQL. However, it takes too much time to be comfortable to use (I'd like to be able to modify and rerun it anytime). I have already optimized this script quite efficiently, and ''system.time'' shows me that most of the time is spent on the DB query within the loop.

Then, as I figured out it would be way faster if I used a text file as input for R, I decided to translate this R script into python, using psycopg2. Unfortunately, the python script is not much faster than the R script.

Finally I started to write a C++ program using libpq-fe, but I stopped as I figured out it was not flexible enough (I mean, I would have to multiply the number of lines of my code by at least 3 or 4 in order to process the queries).

Thus I'm wondering which language (or maybe other R or python libraries ?) would provide the best compromise between speed and flexibility (in terms of queries results : lists, arrays, string manipulations…) for DB access (namely, PostgreSQL). That is, it needs to be much faster than R+RPostgreSQL and python+psycopg2, and almost as "flexible".

Thanks for suggestions (the language has to be linux-friendly).


Update : Here is a typical timing of the old versus new code using only the first 500 retrieved elements, after correcting the code for the N+1 issue as suggested by Ryan and  :

> system.time(source("oldcode.R"));
   user      system      elapsed  
  3.825       0.052      49.363 

> system.time(source("newcode.R"));
   user      system      elapsed 
  1.920       0.140       3.551 

The same for the 1000 first retrieved elements :

> system.time(source("oldcode.R"));
   user      system      elapsed  
  9.816       0.092     100.340 

> system.time(source("newcode.R"));
   user      system      elapsed 
  5.040       0.072       6.695 

Probably worth a change indeed. ;-)

Upvotes: 2

Views: 632

Answers (1)

Burhan Khalid
Burhan Khalid

Reputation: 174622

To make any interface to the database go fast; optimize your database queries. As you discovered even with your optimized code using R, the majority of time was spent at the db. So you should pick the programming language that you are most familiar and comfortable with; as that will be the fastest you can go as far as the front end is concerned.

However the overall result (in terms of perceived performance) will be same no matter what programming language you use. There is no library that can increase the speed of your queries as that is purely a function of the database. All the library/language will allow you to do is combine multiple queries into a single transaction, but the results of the queries are still dependent on your database layout and optimization.

Simple things such as missing indexes on columns can have a big impact.

Start by running EXPLAIN ANALYZE on your query, and paste the result into this tool to visualize what the database is doing so you know where to start optimizing.

Upvotes: 2

Related Questions