Update one postgres table from another postgres table

Question

I am loading a batch csv file to postgres using python (Say Table A). I am using pandas to upload the data into chunk which is quite faster.

for chunk in pd.read_csv(csv_file, sep='|',chunksize=chunk_size,low_memory=False):

Now I want to update another table (say Table B) using A based on following rules

if there are any new records in table A which is not in table B then insert that as a new record in table B (based on Id field)
if the values changes in the Table A for the same ID which exists in Table B then update the records in table B using TableA (There are server tables which i need to update based on Table A )

I am able to do that using below and then loop through each row, but Table A always have records around 1,825,172 and it becomes extremely slow. Any forum member can help to speed this up or suggest a alternate approach to achieve the same.

cursor.execute(sql)
records = cursor.fetchall()

for row in records:  
    id= 0 if row[0] is None else row[0]  # Use this to match with Table B and decide insert or update     
    id2=0 if row[1] is None else row[1]   
    id2=0 if row[2] is None else row[2]

kutschkem · Accepted Answer

You should do this completely inside the DBMS, not loop through the records inside your python script. That allows your DBMS to better optimize.

UPDATE TableB
SET    x=y
FROM TableA
WHERE TableA.id = TableB.id

INSERT INTO TableB(id,x)
SELECT id, y
FROM TableA
WHERE TableA.id NOT IN ( SELECT id FROM TableB )

Update one postgres table from another postgres table

Answers (2)

Related Questions