Reputation: 8415
I am importing a CSV file to postgres and there is no unique column in the dataset. I want to add a serial ID field to uniquely identify each record as its inserted into the table.
I have created a sequence and added an ID field to the table structure before triggering the import:
CREATE SEQUENCE IF NOT EXISTS serial;
CREATE TABLE my_tbl (
fname varchar(100),
lname varchar(100),
company varchar(200),
id integer PRIMARY KEY DEFAULT nextval('serial')
);
I run this code to import the CSV which has data for fname, lname and company:
conn = psycopg2.connect(dbname=dbname, host=host, port=port, user=user, password=pwd)
cur = conn.cursor()
cur.copy_expert("copy {} from STDIN CSV HEADER QUOTE '\"'".format(table_name), file)
cur.execute("commit;")
However, I get an error saying I'm missing data for field "id". I assume under the hood psycopg2 is matching the schemas of the CSV and PG table to validate the COPY before it attempts the insert. A regular insert would succeed as the id field would be populated with a value from the SEQ.
How can I add a unique id field to each record that's copied from the CSV to the PG table?
Upvotes: 1
Views: 1820
Reputation: 121764
You have two options. You can specify columns of the target table in the COPY command, e.g.:
COPY my_tbl(fname, lname, company) FROM STDIN CSV HEADER QUOTE '"'
Alternatively, create the table without the id
primary key, import the csv data and only then add the primary key:
ALTER TABLE my_tbl ADD id serial PRIMARY KEY;
Not related. You do not have to create a sequence for a serial column, let Postgres do this for you:
CREATE TABLE my_tbl (
fname varchar(100),
lname varchar(100),
company varchar(200),
id serial PRIMARY KEY
);
Then the system knows the relationship between the table and the sequence. (Also, serial
is not the best name for a sequence, how do you name the next one when you need it?)
Upvotes: 1