ajaanbaahu
ajaanbaahu

Reputation: 344

How to extract all rows from a large postgres table using python efficiently?

I have been able to extract close to 3.5 mil rows from a postgres table using python and write to a file. However the process is extremely slow and I'm sure not the most efficient. Following is my code:

import psycopg2, time,csv
conn_string = "host='compute-1.amazonaws.com' dbname='re' user='data' password='reck' port=5433"
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
quert = '''select data from table;'''
cursor.execute(quert)

def get_data():
    while True:
        recs = cursor.fetchmany(10000)

        if not recs:
            break

        for columns in recs:
            # do transformation of data here
            yield(columns) 

solr_input=get_data()

with open('prc_ind.csv','a') as fh:
    for i in solr_input:
        count += 1

        if count % 1000 == 0:
             print(count)

         a,b,c,d = i['Skills'],i['Id'],i['History'],i['Industry']
         fh.write("{0}|{1}|{2}|{3}\n".format(a,b,c,d))

The table has about 8 mil rows. I want to ask is there is a better, faster and less memory intensive way to accomplish this.

Upvotes: 0

Views: 2959

Answers (2)

bucket
bucket

Reputation: 61

Psycopg2's copy_to command does the exact same thing as a psql dump, as Loïc suggested, except it's in the python side of things. I've found this to be the fastest way to get a table dump.

The formatting for certain data types (such as hstore/json and composite types) is a bit funky, but the command is very simple.

f = open('foobar.dat', 'wb')
cursor.copy_to(f, 'table', sep='|', columns=['skills', 'id', 'history', 'industry'])

Docs here: http://initd.org/psycopg/docs/cursor.html#cursor.copy_to

Upvotes: 3

Loïc
Loïc

Reputation: 11942

I can see four fields, so I'll assume you are selecting only these.

But even then, you are still loading 8 mil x 4 x n Bytes of data from what seems to be another server. So yes it'll take some time.

Though you are trying to rebuild the wheel, why not use the PostgreSQL client?

psql -d dbname -t -A -F"," -c "select * from users" > output.csv

Upvotes: 3

Related Questions