learner
learner

Reputation: 23

Update one postgres table from another postgres table

I am loading a batch csv file to postgres using python (Say Table A). I am using pandas to upload the data into chunk which is quite faster.

for chunk in pd.read_csv(csv_file, sep='|',chunksize=chunk_size,low_memory=False):

Now I want to update another table (say Table B) using A based on following rules

I am able to do that using below and then loop through each row, but Table A always have records around 1,825,172 and it becomes extremely slow. Any forum member can help to speed this up or suggest a alternate approach to achieve the same.

cursor.execute(sql)
records = cursor.fetchall()

for row in records:  
    id= 0 if row[0] is None else row[0]  # Use this to match with Table B and decide insert or update     
    id2=0 if row[1] is None else row[1]   
    id2=0 if row[2] is None else row[2]    

Upvotes: 1

Views: 113

Answers (2)

kutschkem
kutschkem

Reputation: 8163

You should do this completely inside the DBMS, not loop through the records inside your python script. That allows your DBMS to better optimize.

UPDATE TableB
SET    x=y
FROM TableA
WHERE TableA.id = TableB.id

INSERT INTO TableB(id,x)
SELECT id, y
FROM TableA
WHERE TableA.id NOT IN ( SELECT id FROM TableB )

Upvotes: 0

GMB
GMB

Reputation: 222462

You could leverage Postgres upsert syntax, like:

insert into tableB tb (id, col1, col2)
select ta.id, ta.col1, ta.col2 from tableA ta
on conflict(id) do update
    set col1 = ta.col1, col2 = ta.col2

Upvotes: 2

Related Questions