Reputation: 23
I am loading a batch csv file to postgres using python (Say Table A). I am using pandas to upload the data into chunk which is quite faster.
for chunk in pd.read_csv(csv_file, sep='|',chunksize=chunk_size,low_memory=False):
Now I want to update another table (say Table B) using A based on following rules
I am able to do that using below and then loop through each row, but Table A always have records around 1,825,172 and it becomes extremely slow. Any forum member can help to speed this up or suggest a alternate approach to achieve the same.
cursor.execute(sql)
records = cursor.fetchall()
for row in records:
id= 0 if row[0] is None else row[0] # Use this to match with Table B and decide insert or update
id2=0 if row[1] is None else row[1]
id2=0 if row[2] is None else row[2]
Upvotes: 1
Views: 113
Reputation: 8163
You should do this completely inside the DBMS, not loop through the records inside your python script. That allows your DBMS to better optimize.
UPDATE TableB
SET x=y
FROM TableA
WHERE TableA.id = TableB.id
INSERT INTO TableB(id,x)
SELECT id, y
FROM TableA
WHERE TableA.id NOT IN ( SELECT id FROM TableB )
Upvotes: 0
Reputation: 222462
You could leverage Postgres upsert syntax, like:
insert into tableB tb (id, col1, col2)
select ta.id, ta.col1, ta.col2 from tableA ta
on conflict(id) do update
set col1 = ta.col1, col2 = ta.col2
Upvotes: 2