user1592380
user1592380

Reputation: 36267

Preventing duplicate entries with sqlalchemy in preexisting sqllite table

I have a preexisting sqllite table which I am accessing with sqlalchemy. I realized has a number of duplicate 'case' numbers exist. If I understand correctly it appears that you cannot add a unique constraint to a table after table creation with sqllite, after removing the dups using:

DELETE FROM mytable
WHERE id NOT IN
(
SELECT MIN(id)
FROM judgements
GROUP BY "case"

I've decided to use sqlalchemy to prevent addition of additional dups. I'm working with scrapy and have a pipeline element that looks like:

class DynamicSQLlitePipeline(object):

    def __init__(self,table_name):
        db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
        _engine = create_engine(db_path)
        _connection = _engine.connect()
        _metadata = MetaData()
        _stack_items = Table(table_name, _metadata,
                             Column("id", Integer, primary_key=True),
                             Column("case", Text , unique=True),
                              ....)
        _metadata.create_all(_engine)
        self.connection = _connection
        self.stack_items = _stack_items



    def process_item(self, item, spider):

        try:
            ins_query = self.stack_items.insert().values(
            case=item['case'],
            ....
            )
            self.connection.execute(ins_query)
        except IntegrityError:
                print('THIS IS A DUP')
        return item

The only change I've made is to add unique=True to column 'case' . However on testing, dups are still being added/ How can I get this working?

Upvotes: 1

Views: 1290

Answers (1)

ichbinblau
ichbinblau

Reputation: 4809

The code snippet below works on my side with python version 2.7 and sqlalchemy version 1.0.9 and sqlite version 3.15.2.

from sqlalchemy import create_engine, MetaData, Column, Integer, Table, Text
from sqlalchemy.exc import IntegrityError


class DynamicSQLlitePipeline(object):

    def __init__(self, table_name):
        db_path = "sqlite:///data.db"
        _engine = create_engine(db_path)
        _connection = _engine.connect()
        _metadata = MetaData()
        _stack_items = Table(table_name, _metadata,
                             Column("id", Integer, primary_key=True),
                             Column("case", Text, unique=True),)
        _metadata.create_all(_engine)
        self.connection = _connection
        self.stack_items = _stack_items

    def process_item(self, item):

        try:
            ins_query = self.stack_items.insert().values(case=item['case'])
            self.connection.execute(ins_query)
        except IntegrityError:
                print('THIS IS A DUP')
        return item

if __name__ == '__main__':

    d = DynamicSQLlitePipeline("pipeline")
    item = {
        'case': 'sdjwaichjkneirjpewjcmelkdfpoewrjlkxncdsd'
    }
    print d.process_item(item)

And the output for the second run would be like :

THIS IS A DUP
{'case': 'sdjwaichjkneirjpewjcmelkdfpoewrjlkxncdsd'}

I did not see much difference between your code logic. The only difference might be the version I guess.

Upvotes: 1

Related Questions