Reputation: 4374
We have a few functionalities in our Flask web-application that consist of a single function call that calls a lot of sub-functions and does a lot of stuff behind the scenes. For example, it adds (financial) transactions to the (MSSQL) database, writes stuff in a log-table in the database and changes properties of specific objects, resulting in changed columns in specific tables in our database. All of this is done using SQLAlchemy through objects.
In a new approach, because of testability, and because we sometimes want to only display these changes without actually committing them to the database, we have these functions return a compound Python object that contains all of the changed objects. So, instead of committing database changes inside the function and sub-functions, we have them return the changed objects, so we can decide to show or save them outside of the main function.
So the main function returns a compound object with all of these changed objects included and outside of the main function we add these changed objects to our SQLAlchemy session and commit the session to the database. (or, if we just need to display information, we don't add and commit). The way we do this is the compound result object has a save_to_session()
function that saves our changed objects with SQLAlchemy's bulk_save_objects()
operation:
if result:
result.save_to_session(current_app.db_session)
current_app.db_session.commit()
def save_to_session(self, session):
session.bulk_save_objects(self.adminlog)
...
This new approach has led to an error we didn't expect in the current_app.db_session.commit()
line. It seems that at the end of the process, when we add the returned objects to the session and we try to commit the session to the database, an error occurs about a duplicate key.
It looks like during the process, the returned objects were already added to the session somewhere, and SQLAlchemy tries to add them twice.
We have come to this conclusion, because when we comment out the bulk_save_objects()
call, there is no longer an error message. However, the changed data is committed to the database correctly, and precisely once.
When we inspect the database after this error occurs, there are no records with the Primary key mentioned in the error message. This is because of the rollback that occurs on error. So it's not that the record already existed in the database either, but more like the session tries to add the same record twice.
This is the error we get, using pymssql as a driver:
sqlalchemy.exc.IntegrityError: (pymssql.IntegrityError) (2627,
b"Violation of PRIMARY KEY constraint 'PK_adminlog_id'.
Cannot insert duplicate key in object 'dbo.adminlog'.
The duplicate key value is (0E5537FF-E45C-40C5-98FC-7B1ACAD8104E).
DB-Lib error message 20018, severity 14:\n
General SQL Server error: Check messages from the SQL Server\n
")
[SQL:
'INSERT INTO adminlog (
alog_id,
alog_ppl_id,
alog_user_ppl_id,
alog_user_name,
alog_datetime,
[alog_ipAddress],
[alog_macAddress],
alog_comment,
alog_type,
alog_act_id,
alog_comp_id,
alog_artc_id)
VALUES (
%(alog_id)s,
%(alog_ppl_id)s,
%(alog_user_ppl_id)s,
%(alog_user_name)s,
%(alog_datetime)s,
%(alog_ipAddress)s,
%(alog_macAddress)s,
%(alog_comment)s,
%(alog_type)s,
%(alog_act_id)s,
%(alog_comp_id)s,
%(alog_artc_id)s)']
[parameters: (
{'alog_act_id': None,
'alog_comment': 'Le service a été ajouté. Cours Coll (119,88)',
'alog_datetime': datetime.datetime(2018, 10, 29, 13, 46, 54, 837178),
'alog_macAddress': b'4A-NO-NY-MO-US',
'alog_type': b'user',
'alog_artc_id': None,
'alog_comp_id': None,
'alog_id': b'0E5537FF-E45C-40C5-98FC-7B1ACAD8104E',
'alog_user_ppl_id': b'99999999-9999-9999-1111-999999999999',
'alog_user_name': 'System',
'alog_ipAddress': b'0.0.0.0',
'alog_ppl_id': b'AE841D1C-5D8D-47F7-B81F-89C5C931BD14'},
{'alog_act_id': None,
'alog_comment': 'Le service a été supprimé.
01/12/2019 Cours Coll (119,88)',
'alog_datetime': datetime.datetime(2018, 10, 29, 13, 46, 55, 71600),
'alog_macAddress': b'4A-NO-NY-MO-US',
'alog_type': b'user',
'alog_artc_id': None,
'alog_comp_id': None,
'alog_id': b'E22176FB-7490-470F-A8BA-A35D5F55A96A',
'alog_user_ppl_id': b'99999999-9999-9999-1111-999999999999',
'alog_user_name': 'System',
'alog_ipAddress': b'0.0.0.0',
'alog_ppl_id': b'AE841D1C-5D8D-47F7-B81F-89C5C931BD14'}
)]
We get a similar error using PyODBC:
sqlalchemy.exc.IntegrityError: (pyodbc.IntegrityError) ('23000',
"[23000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Violation of PRIMARY KEY constraint 'PK_adminlog_id'.
Cannot insert duplicate key in object 'dbo.adminlog'.
The duplicate key value is (F5CABD8F-E000-4677-8F5F-78B4CD3B9560). (2627) (SQLExecDirectW);
[23000] [Microsoft][SQL Server Native Client 11.0][SQL Server]The statement has been terminated. (3621)")
[SQL: 'INSERT INTO adminlog (
alog_id,
alog_ppl_id,
alog_user_ppl_id,
alog_user_name,
alog_datetime,
[alog_ipAddress],
[alog_macAddress],
alog_comment,
alog_type,
alog_act_id,
alog_comp_id,
alog_artc_id)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)']
[parameters: ((
b'F5CABD8F-E000-4677-8F5F-78B4CD3B9560',
b'0D10D3EF-F37E-45BE-8EED-B5987AE80732',
b'99999999-9999-9999-1111-999999999999',
'System',
datetime.datetime(2018, 10, 29, 13, 51, 30, 555495),
b'0.0.0.0',
b'4A-NO-NY-MO-US',
'Le service a été ajouté. Cours Coll (119,88)',
b'user',
None,
None,
None),
(
b'39395ACA-0AFB-4C5F-90D4-0C6F95D7B8BC',
b'0D10D3EF-F37E-45BE-8EED-B5987AE80732',
b'99999999-9999-9999-1111-999999999999',
'System',
datetime.datetime(2018, 10, 29, 13, 51, 30, 777909),
b'0.0.0.0',
b'4A-NO-NY-MO-US',
'Le service a été supprimé. 01/12/2019 Cours Coll (119,88)',
b'user',
None,
None,
None)
)]
My question is, is there an automatic process that adds (changed) objects to the session, without us using session.add()
?
Is there an option in SQLAlchemy to disable this behaviour and only commit to the session when it's explicitly done using session.add(object)
?
Upvotes: 0
Views: 2338
Reputation: 52929
My question is, is there an automatic process that adds (changed) objects to the session, without us using
session.add()
?
There is at least one feature that pulls objects to a Session
without explicitly adding them: save-update
cascade. When an object is added to a Session
all objects associated with it through relationship()
attributes that have this cascade configured are placed in the Session
as well. The same happens also when an object is associated with another that is already in a Session
.
Is there an option in SQLAlchemy to disable this behaviour and only commit to the session when it's explicitly done using
session.add(object)
?
You can of course configure the relationship()
attributes to not include this behaviour, but there does not seem to be a global switch that would disable cascades altogether.
If this is the case in your code, then the reason why the objects are being added twice is that you've somewhat explicitly done so. The bulk operations omit most of the more advanced features of Session
in favour of raw performance – for example they do not coordinate with the Session
if an object has already been persisted, nor do they attach persisted objects to the Session
:
The objects as given have no defined relationship to the target
Session
, even when the operation is complete, meaning there’s no overhead in attaching them or managing their state in terms of the identity map or session.
As to why the problem arises in the first place, you should not need to manually keep a "staging area" – your compound object – around for objects. That is exactly what the Session
is for, combined with proper use of transactions. The functions and subfunctions should add objects to the Session
when it makes sense, but they should not control the ongoing transaction. That should happen only outside your main function, where you are now handling your compound object. If you rollback, all the changes go away.
In testing you can pass around a Session
that has joined an external transaction that will be explicitly rollbacked, no matter what the code under test does.
Upvotes: 1