Reputation: 36267
I have a preexisting sqllite table which I am accessing with sqlalchemy. I realized has a number of duplicate 'case' numbers exist. If I understand correctly it appears that you cannot add a unique constraint to a table after table creation with sqllite, after removing the dups using:
DELETE FROM mytable
WHERE id NOT IN
(
SELECT MIN(id)
FROM judgements
GROUP BY "case"
I've decided to use sqlalchemy to prevent addition of additional dups. I'm working with scrapy and have a pipeline element that looks like:
class DynamicSQLlitePipeline(object):
def __init__(self,table_name):
db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
_engine = create_engine(db_path)
_connection = _engine.connect()
_metadata = MetaData()
_stack_items = Table(table_name, _metadata,
Column("id", Integer, primary_key=True),
Column("case", Text , unique=True),
....)
_metadata.create_all(_engine)
self.connection = _connection
self.stack_items = _stack_items
def process_item(self, item, spider):
try:
ins_query = self.stack_items.insert().values(
case=item['case'],
....
)
self.connection.execute(ins_query)
except IntegrityError:
print('THIS IS A DUP')
return item
The only change I've made is to add unique=True to column 'case' . However on testing, dups are still being added/ How can I get this working?
Upvotes: 1
Views: 1290
Reputation: 4809
The code snippet below works on my side with python version 2.7 and sqlalchemy version 1.0.9 and sqlite version 3.15.2.
from sqlalchemy import create_engine, MetaData, Column, Integer, Table, Text
from sqlalchemy.exc import IntegrityError
class DynamicSQLlitePipeline(object):
def __init__(self, table_name):
db_path = "sqlite:///data.db"
_engine = create_engine(db_path)
_connection = _engine.connect()
_metadata = MetaData()
_stack_items = Table(table_name, _metadata,
Column("id", Integer, primary_key=True),
Column("case", Text, unique=True),)
_metadata.create_all(_engine)
self.connection = _connection
self.stack_items = _stack_items
def process_item(self, item):
try:
ins_query = self.stack_items.insert().values(case=item['case'])
self.connection.execute(ins_query)
except IntegrityError:
print('THIS IS A DUP')
return item
if __name__ == '__main__':
d = DynamicSQLlitePipeline("pipeline")
item = {
'case': 'sdjwaichjkneirjpewjcmelkdfpoewrjlkxncdsd'
}
print d.process_item(item)
And the output for the second run would be like :
THIS IS A DUP
{'case': 'sdjwaichjkneirjpewjcmelkdfpoewrjlkxncdsd'}
I did not see much difference between your code logic. The only difference might be the version I guess.
Upvotes: 1