Francesco Della Vedova
Francesco Della Vedova

Reputation: 943

SQLalchemy duplicate entry error for unknown value

I am writing a script to synchronize Adwords accounts and a local database wit Sqlalchemy. I am following the object hierarchy of the Adwords API, so my first table is 'campaigns' and the second is 'adgroups'

here is how I define the two:

class Campaign(Base):

__tablename__ = 'aw_campaigns'

id = Column(Integer, primary_key=True)
name = Column(String(99))
impressions = Column(Integer)
serving_status = Column(String(99))
start_date = Column(String(99))
status = Column(String(99))

def __init__(self, id, name, impressions, serving_status, start_date, status):
    self.id = id
    self.name = name
    self.impressions = impressions
    self.serving_status = serving_status
    self.start_date = start_date
    self.status = status

class Adgroup(Base):

__tablename__ = 'aw_adgroups'

id = Column(Integer, primary_key=True)  # , primary_key=True
name = Column(String(99))
camp_id = Column(Integer, ForeignKey('aw_campaigns.id'))  # , ForeignKey('aw_campaigns.id')
camp_name = Column(String(99))
ctr = Column(Float)
cost = Column(Float)
impressions = Column(Integer)
clicks = Column(Integer)
status = Column(String(99))

def __init__(self, id, name, camp_id, camp_name, ctr, cost, impressions, clicks, status):
    self.id = id
    self.name = name
    self.camp_id = camp_id
    self.camp_name = camp_name
    self.ctr = ctr
    self.cost = cost
    self.impressions = impressions
    self.clicks = clicks
    self.status = status

I query the API, and then build the list of objects for the lines in the Adgroup table:

adgr_query = 'SELECT CampaignId, CampaignName, Clicks, Cost, Impressions, Ctr, Id, KeywordMaxCpc, Name, Settings, Status'
adgr_page = ad_group_serv.Query(adgr_query)[0]['entries']

adgr_ins = [Adgroup(i['id'],
            i['name'],
            i['campaignId'],
            i['campaignName'],
            i['stats']['ctr'],
            i['stats']['cost']['microAmount'],
            i['stats']['impressions'],
            i['stats']['clicks'],
            i['status']) for i in adgr_page if int(i['id']) not in adgr_exist]

but when I commit I get the error:

 (IntegrityError) (1062, "Duplicate entry '2147483647' for key 'PRIMARY'")

The problem is that I have no idea where that value is from.

'2147483647' in [i['id'] for i in adgr_page]
>>> False
'2147483647' in str(adgr_page)
>>> False

I am really stuck on this.

Upvotes: 1

Views: 664

Answers (1)

vvladymyrov
vvladymyrov

Reputation: 5793

Looks like you have integer overflow somewhere.

The symptom: 2147483647 is 2**31-1 - indicates that 32 bits were used to store the number.

AdGroup.Id field has type xsd:long which has 64 bits length.

Python itself has no limitation on the size of integer value but database may has such limit.

Short solution:

Try to use BigInteger sqltype type id = Column(BigInteger, primary_key=True) and the same for camp_id and the rest xsd:long values coming from AdWords API. There is chance that SQLAlchemy will pick database specific big integer column type. Or you can use String(64) as a type for id. But in this case it you'll need extra step to generate primary key.

How many entries your query to AdWords API return? Are there more then 2**32 records? I doubt it - it is unlikely that your database will be able to handle ~4200 millions of records.

Solution 2 - long term

Although I would suggest to no trust primary key integrity to external source and would rely on database to generate primary key using autoincrement and rely on SQLAlchemy to handle foreign keys population based on database generated primary keys:

class Adgroup(Base):
    __tablename__ = 'aw_adgroups'
    id = Column(Integer, Sequence('adgroup_seq'), primary_key=True)  # , primary_key=True
    adGroupId = Column(String(64)) 
    campaignId = Column(Integer,ForeignKey('aw_campaigns.id'))
    campaign = relationship("Campaign", backref = "adgroup")
    ...

class Campaign(Base):
    __tablename__ = 'aw_campaigns'
    id = Column(Integer, Sequence('adgroup_seq'), primary_key=True)
    campaignId = Column(String(64))
    ...

Also looks like you may need to do look up by campaignId and adGroupId - so you can add indexes on them.

Then you create your Campaign and AdGroup objects and just add relations between them. The code will depend on type of relationship your want to use - one-to-many or many-to-many. Check sqlalchemy relationship manual for more details.

ag = AdGroup(**kwargs)
camp = Campaign(**kwargs)
ag.campaign = camp
session.add(ag)

Upvotes: 1

Related Questions