Reputation: 117
I'm new to python and am facing what seems to be a memory leakage error. I've written a simple script that is trying to fetch multiple columns from a postgres database and then proceeds to perform simple subtraction on these columns and store the result in a temporary variable which is being written to a file. I need to do this on multiple pairs of columns from the db and I'm using a list of lists to store the different column names.
I'm loop over the individual elements of this list until the list is exhausted. While I'm getting valid results(by valid I mean that the output file contains the expected values) for the first few column pairs, the program abruptly gets "Killed" somewhere in between execution. Code below:
varList = [ ['table1', 'col1', 'col2'],
['table1', 'col3', 'col4'],
['table2', 'col1', 'col2'],
# ..
# and many more such lines
# ..
['table2', 'col3', 'col4']]
try:
conn = psycopg2.connect(database='somename', user='someuser', password='somepasswd')
c = conn.cursor()
for listVar in varList:
c.execute("SELECT %s FROM %s" %(listVar[1], listVar[0]))
rowsList1 = c.fetchall();
c.execute("SELECT %s FROM %s" %(listVar[2], listVar[0]))
rowsList2 = c.fetchall();
outfile = file('%s__%s' %(listVar[1], listVar[2]), 'w')
for i in range(0, len(rowsList1)):
if rowsList1[i][0] == None or rowsList2[i][0] == None:
timeDiff = -1
else:
timestamp1 = time.mktime(rowsList1[i][0].timetuple())
timestamp2 = time.mktime(rowsList2[i][0].timetuple())
timeDiff = timestamp2 - timestamp1
outfile.write(str(timeDiff) + '\n')
outfile.close();
del rowsList1, rowsList2
#numpy.savetxt('output.dat', column_stack(rows))
except psycopg2.DatabaseError, e:
print 'Error %s' % e
sys.exit(1)
finally:
if conn:
conn.close()
My initial guess was that there was some form of memory leak and in an attempt to fix this, I added a del statement on the two large arrays hoping that the memory gets properly collected. This time, I got slightly better outputs(by slightly better I mean that more output files were created for the db column pairs). However, after the 10th or 11th pair of columns, my program was "Killed" again. Can someone tell me what could be wrong here. Is there a better way of getting this done? Any help is appreciated.
PS: I know that this is a fairly inefficient implementation as I'm looping many times, but I needed something quick and dirty for proof of concept.
Upvotes: 1
Views: 2066
Reputation: 512
I think the problem here is you are selecting everything and then filtering it in the application code when you should be selecting what you want with the sql query. If you select what you want in the sql query like this:
for listvar in varlist: select listvar[1], listvar[2] from listvar[0] where listvar[1] is not null and listvar[2] is not null
# then...
timeDiff = {}
for row in rows:
timestamp1 = time.mktime(row[0].timetuple())
timestamp2 = time.mktime(row[0].timetuple())
timeDiff[identifier] = timestamp2 - timestamp1 #still need to assoc timediff with row... maybe you need to query a unique identifyer also?
#and possibly a separate... (this may not be necessary depending on your application code. do you really need -1's for irrelevant data or can you just return the important data?)
select listvar[1], listvar[2] from listvar[0] where listvar[1] is null or listvar[2] is null
for row in rows:
timeDiff[identifier] = -1 # or None
Upvotes: 1