ZergRush
ZergRush

Reputation: 65

SQLAlchemy insert, update precondition (consistency check)

I would like to keep an invariant when inserting into or updating an 'interval' many-to-many relation using SQLAlchemy. The goal is to make sure that there are no overlapping intervals in the table that represents the intervals for the same object (A).

For example:

class A(Base):
    __tablename__ = 'a'
    id = Column(Integer, primary_key=True)
    intervals = relationship('Interval', backref='a', cascade='all, delete-orphan')


class B(Base):
    __tablename__ = 'b'
    id = Column(Integer, primary_key=True)
    intervals = relationship('Interval', backref='b', cascade='all, delete-orphan')

class Interval(Base):
    __tablename__ = 'interval'
    id = Column(Integer, primary_key=True)
    a_id = Column(Integer, ForeignKey('a.id', ondelete='cascade'), nullable=False)
    b_id = Column(Integer, ForeignKey('b.id', ondelete='cascade'), nullable=False)
    start = Column(DateTime, nullable=False)
    end = Column(DateTime, nullable=False)

Updated based on van's notes (thanks):

So I want to ensure before/during insert and update that there is no record in the 'interval' table where

((interval.start >= start and interval.start <= end) or
(interval.end >= start and interval.end <= end) or
(start >= interval.start and start <= interval.end) or
(end >= interval.start and end <= interval.end)) and
a_id = interval.a_id

So the question would be what the best way is to achieve this with SQLAlchemy using a MySQL backend. I want to make sure this check is as atomic as possible and there is no possibility to violate it with concurrent operations. I hope the question is clear. Thanks for the help in advance.

Update 1:

Answers to van's questions:

What is your workflow in case of validation failure?

What will you do with those Intervals which failed validation and were not added to a?

note that they will still be committed to DB because they belong to b1 but have a_id = NULL (you do allow NULLs in your current model)

What should the end user see? Shall the transaction be committed at all etc? Are you happy to wrap any a.intervals.add(...) in a try/except?

This is how I imagine adding an interval:

1.) Form validation happens and we know that other fields already failed validation (interval will not be added to the session) and we want to check if the interval also fails validation:

### Start request
# Prevalidation
Interval.check_valid(a, start, end)

...
#### End request

2.) All form fields passed validation, we want to check if interval would be valid if committed (this might not be needed) and then commit the interval:

# Start request
# Basic validation at the time of addition
try:
    interval = Interval(a, b, start, end)
except SomeException:
    return("This interval overlaps another, please fix this!")

...

# Main validation when the interval relation is committed to database
try:
    session.flush() # commit the session to the database
except AnotherException: # maybe IntegrityError or something similar
    return("This interval overlaps another, please fix this!")
### End request

I'm not sure if the Basic validation is really needed though. The Prevalidation is needed because if someone POSTs the form and we see some errors for other fields we would like to get all invalid fields listed in the response. The reason for this is that I don't want the user to POST the form several times with coming back to him again and again with newer errors, but I would like to list all the errors at once that I'm able to check.

Looking at the solution van provided it does do the Basic validation, but there might be some edge cases where this could cause problems. This is a web application so it is possible that two different users are using different sessions, so when doing i1 = Interval(a1, b1, start1, end1), i1 does not appear in other sessions until the registry is refreshed from database in that session. (At least this is what I think how it works.)

  1. Users tries to add an interval (i1) that that does not overlap any intervals in the database. Interval(a1, b1, start1, end1)
  2. Users tries to add an interval (i2) that that does not overlap any intervals in the database, but overlaps with i1. Interval(a1, b2, start2, end2)
  3. Both validations might succeed as the two users might be using different sessions.
  4. Session i1 is belonging to is flush()-ed and committed to database without errors.
  5. Session i2 is belonging to is flush()-ed and committed to database without errors.

Is the situation listed here possible or am I misunderstanding something?

I was also thinking about adding UPDATE and INSERT triggers to the 'interval' table to do this check. I'm not sure if this is the right way to go and if that provides the atomicity that will ensure that concurrent attempts like listed above will not cause a problem. If this is the right way to go my question would be what the right way is for creating those triggers when calling Base.metedata.create_all(). Is this the only way for doing this or is there a possibility to somehow attach this to the model and let create_all() create it: DELIMITER / Creating a trigger in SQLAlchemy

Update 2:

The right algorithm for checking if the intervals collide seems to be:

interval.start <= end and start <= interval.end

I found that the right approach for ensuring the appropriate atomicity is to just use select for update in the method where the overlap is checked. This works good with MySQL at least.

Upvotes: 1

Views: 1350

Answers (1)

van
van

Reputation: 76962

First of all: your check is not sufficient to verify the overlap, as it does not cover the case where one interval is completely contained in the other. Please see the check in the validator code below.

Then: this is not a straightforward check to do on sqlalchemy side. For start please have a look at Simple Validators documentation. Your implementation might look like this:

class A(Base):
    __tablename__ = 'a'
    id = Column(Integer, primary_key=True)

    @validates('intervals', include_backrefs=True, include_removes=False)
    def validate_overlap(self, key, interval):
        assert key == 'intervals'
        # if already in a collection, ski the validation
        #   this might happen if same Interval was added multiple times
        if interval in self.intervals:
            return interval
        # assert that no other interval overlaps
        overlaps = [i for i in self.intervals
                if ((i.start >= interval.start and i.start <= interval.end) or
                    (i.end >= interval.start and i.end <= interval.end) or
                    (i.start <= interval.start and i.end >= interval.start) or
                    (i.start <= interval.end and i.end >= interval.end)
                    )
                ]
        assert not(overlaps), "Interval overlaps with: {}".format(overlaps)
        return interval

Now the sample code like this should work, where by "work" i mean "validation code runs and assert exception is thrown when an interval with overlaps is added":

session.expunge_all()
a1, b1 = _query_session_somehow(...) # assume that a1 has no intervals yet
i1 = Interval(b=b1, start=date(2013, 1, 1), end=date(2013, 1, 31))
a1.intervals.append(i1)
i2 = Interval(b=b1, start=date(2013, 2, 1), end=date(2013, 2, 28))
a1.intervals.append(i2)
i3 = Interval(b=b1, start=date(2013, 2, 8), end=date(2013, 2, 19))
try:
    a1.intervals.append(i3)
except Exception as exc:
    print "ERROR", exc
session.commit()
assert 2 == len(a1.intervals)

What you should be aware of is the following comment from the same documentation page. Verbatim:

Note that a validator for a collection cannot issue a load of that collection within the validation routine - this usage raises an assertion to avoid recursion overflows. This is a reentrant condition which is not supported.

So, if we were to modify the usage code a bit and add the other side of the relationship first, you should expect sqlalchemy exception AssertionError: Collection was loaded during event handling. and the code would not work because of this limitation:

session.expunge_all()
a1, b1 = _query_session_somehow(...)
# a1.intervals # @note: uncomment to make sure the collection is loaded
i1 = Interval(a=a1, b=b1, start=date(2013, 1, 1), end=date(2013, 1, 31))

What we did here is we added the other side of the relationship first, but our validator would need to load the intervals collection, which would cause the limitation to trigger in case the a1.intervals were not loaded first.

One can probably work-around this by making sure that the a.intervals is always loaded. Uncommenting the commented line in the code snippet above should make it work.

As there are few tricky points with this implementation, you might take a look at Session Events instead.

But given that you have the validation figured out:

  • What is your workflow in case of validation failure?
  • What will you do with those Intervals which failed validation and were not added to a?
    • note that they will still be committed to DB because they belong to b1 but have a_id = NULL (you do allow NULLs in your current model)
  • What should the end user see?
  • Shall the transaction be committed at all etc?
  • Are you happy to wrap any a.intervals.add(...) in a try/except?

Upvotes: 1

Related Questions