emax
emax

Reputation: 7235

Postgresql: how to copy multiple columns from one table to another?

I am trying to copy some columns from a table called temporarytable to another one called scalingData using psycopg2 in python.

scalingData is a pandas dataframe. The dataframe contains data from cities such as: nameOfCities, population, etc.

scalingData = pd.read_csv('myFile.csv')  ## 'myFile.csv' is the datasource

each column of the dataframe has a different kind of data, such as 'int64', 'float64' or 'O'.

Here a screen shot from Jupyter

enter image description here

import psycopg2 as ps
## Populate table scalingData
tmp = scalingData.dtypes
con = None
con = ps.connect(dbname = 'mydb', user='postgres', host='localhost', password='mypd')
con.autocommit = True
cur = con.cursor()
for i in range(0,5):
    j = header[i]
    stat = """  ALTER TABLE "scalingData" ADD COLUMN "%s" """%j
    if tmp[i] == 'int64':
        stat = stat+'bigint'
    if tmp[i] == 'float64':
        stat = stat+'double precision'
    if tmp[i] == 'O':
        stat = stat+'text'
    ### Add Column
    cur.execute(stat)
    stat1 = """INSERT INTO "scalingData" ("%s") SELECT "%s" FROM temporarytable"""%(j,j)
    ### Copy Column
    cur.execute(stat1)
cur.close()    
con.close()

My problem is that if I look at scalingData only the first column is copied while the others are empty.

Here a screenshot of the table from pgAdmin afer the query:

Also if I copy for instance the second column as first column it works, but then it fails with the others as well.

Upvotes: 1

Views: 1001

Answers (2)

Valera
Valera

Reputation: 2923

This happens because you add 1 field to your new table, than insert data only with that field set up, and you do it 5 times. So you should actually see 5 copies of your original table with only 1 field set up.

You need to first set up the structure for your scalingData table, then insert all the records with all fields.

Here is the code (not a Python developer):

import psycopg2 as ps
## Populate table scalingData
tmp = scalingData.dtypes
con = None
con = ps.connect(dbname = 'mydb', user='postgres', host='localhost', password='mypd')
con.autocommit = True
cur = con.cursor()
for i in range(0,5):
    j = header[i]
    stat = """  ALTER TABLE "scalingData" ADD COLUMN "%s" """%j
    if tmp[i] == 'int64':
        stat = stat+'bigint'
    if tmp[i] == 'float64':
        stat = stat+'double precision'
    if tmp[i] == 'O':
        stat = stat+'text'
    ### Add Column
    cur.execute(stat)

fieldsStr = '"' + '", "'.join([header]) + '"' ### will return "header1", "header2", ... , "header5"
stat1 = """INSERT INTO "scalingData" (%s) SELECT %s FROM temporarytable"""%(fieldsStr,fieldsStr)
### Copy Table
cur.execute(stat1)

cur.close()    
con.close()

Upvotes: 2

Denis de Bernardy
Denis de Bernardy

Reputation: 78473

I'm not familiar with Python, but just a guess as to where the issue might be coming from:

"""INSERT INTO "scalingData" ("%s") SELECT "%s" FROM temporarytable"""

... will transform the "%s" bit into "foo, bar, baz" rather than "foo", "bar", "baz".

Put another way you should remove the unneeded double quotes in your statement and escape the individual column names instead.

Double quotes are used in PG to quote identifiers. You can literally have an table or column called "foo, bar, baz" and PG will work just fine when you do - provided it's always in-between double quotes when you use it in a statement.

Upvotes: 0

Related Questions