Reputation: 5053
I have a db with a million row, I want to fetch all the rows and do some operation on them an insert them into another table (newTable).
I figured out I need to use server side cursor, since I can not fetch all data into memory. and I also figured out I need to use two connections so when I commit I dont loose the cursor that I made.
but now my problem is, it wont put all the records into the newTable as it shows in the log. in console log I see it tries to insert 500,000 th record into the database
560530 inserting 20551581 and 2176511
but when I do a count on the created table (while it is doing it) it shows only about 10,000 rows in the new table .
select count(*) from newTable;
count
-------
10236
and when the program finishes, I only have about 11000 records in the new table, while in the records it shows it tried to insert at least 2 million rows. whats wrong with my code?
def fillMyTable(self):
try:
self.con=psycopg2.connect(database='XXXX',user='XXXX',password='XXXX',host='localhost')
cur=self.con.cursor(name="mycursor")
cur.arraysize=1000
cur.itersize=2000
self.con2=psycopg2.connect(database='XXXX',user='XXXX',password='XXXX',host='localhost')
cur2=self.con2.cursor()
q="SELECT id,oldgroups from oldTable;"
cur.execute(q)
i=0
while True:
batch= cur.fetchmany()
if not batch:
break
for row in batch:
userid=row[0]
groupids=self.doSomethingOnGroups(row[1])
for groupid in groupids:
# insert only if it does NOT exist
i+=1
print (str(i)+" inserting "+str(userid)+" and "+str(groupid))
q2="INSERT INTO newTable (userid, groupid) SELECT %s, %s WHERE NOT EXISTS ( SELECT %s FROM newTable WHERE groupid = %s);"%(userid,groupid,userid,groupid)
cur2.execute(q2)
self.con2.commit()
except psycopg2.DatabaseError, e:
self.writeLog(e)
finally:
cur.close()
self.con2.commit()
self.con.close()
self.con2.close()
Update : I also noticed it uses lots of my RAM, isnt server side cursor supposed not do that?
Cpu(s): 15.2%us, 6.4%sy, 0.0%ni, 56.5%id, 2.8%wa, 0.0%hi, 0.2%si, 18.9%st Mem: 1695220k total, 1680496k used, 14724k free, 3084k buffers Swap: 0k total, 0k used, 0k free, 1395020k cached
Upvotes: 0
Views: 696
Reputation: 125454
If the oldgroups column is in the form 1,3,6,7
this will work:
insert into newTable (userid, groupid)
select id, groupid
from (
select
id,
regexp_split_to_table(olgroups, ',') as groupid
from oldTable
) o
where
not exists (
select 1
from newTable
where groupid = o.groupid
)
and groupid < 10000000
But I suspect you want to check for the existence of both groupid and id:
insert into newTable (userid, groupid)
select id, groupid
from (
select
id,
regexp_split_to_table(olgroups, ',') as groupid
from oldTable
) o
where
not exists (
select 1
from newTable
where groupid = o.groupid and id = o.id
)
and groupid < 10000000
The regexp_split_to_table
function will "explode" the oldgroups
column in rows doing a cross join with the id column.
Upvotes: 1