user1781186
user1781186

Reputation:

Postgres says key already exists although it doesn't

My Python application uses Psycopg2 to insert content from a web scraper to a PostgreSQL database. Psycopg2 complains that a certain primary key already exists even though it clearly doesn't.

Error:

psycopg2.IntegrityError: duplicate key value violates unique constraint "my_table_pkey"
DETAIL:  Key (id)=(12345) already exists.

Query:

SELECT * FROM my_table where id=12345;
-- 0 rows returned

What is going on here?

Edit for background:

Basically, what the code does is scrape a discussion forum, loops over each page in each discussion thread and inserts some data from each thread into Postgres. The general structure of the code is outlined below. Note that get returns a well formatted data structure for each thread.

import psycopg2

base_url 'http://someforum.com'
conn = psycopg2.connect('dbname=mydb user=me')

for i in range(10000):
    thread = get('{}/'{}.format(base_url, i)
    for page in thread:
        sql = 'INSERT INTO my_table (id, foo, bar) VALUES(%s, %s, %s);'
        values = [page['id'], page['foo'], page['bar']]
        cur.execute(sql, values)
    conn.commit()
cur.close()
conn.close()

Upvotes: 0

Views: 3980

Answers (1)

user1781186
user1781186

Reputation:

After some digging, I discovered that the forum I'm scraping sometimes returns an erroneous number of pages for a given thread. So when the app tries to scrape page number 5 (which does not exist), it is instead redirected to page number 4 and tries to insert the same posts again. Hence the integrity error.

Upvotes: 2

Related Questions