Jan Katins
Jan Katins

Reputation: 2319

SQLAlchemy - Run autoload again

I've a web app which lets user run some ETL and also exposes the final table via SQLAlchemy models in a UI. The models use the auto_load feature of SQLAlchemy:

Base = declarative_base(name='Model')
class Product(Base):
    __table__ = Table('product', Base.metadata,
                      Column('product_number', Text, primary_key=True),
                      autoload=True, autoload_with=engine, schema='data')

Sometimes the ETL results in a different table structure (new/removed/renamed columns, table name is still the same) and up to now I have to restart the whole web app to update the autoloaded SQLAlchemy models. Is there a way to rerun the autoload process on my models?

Upvotes: 2

Views: 464

Answers (1)

Jan Katins
Jan Katins

Reputation: 2319

I've now build a accessor class which can be used similar to a module:

from sqlalchemy.ext.declarative import declarative_base
# in my case this saves the last timestamp when the db was 
# updated into a memcache 
from app.elsewhere import get_last_database_change_time
class Models():
    """A module-like accessor which allows to reset the models.
    """

    _models = {}

    def __init__(self):
        self.Base = declarative_base(name='Base')
        self._last_reset = get_last_database_change_time()

    def reset(self):
        """Resets all models so that they are autoloaded again the next time they are accessed via the properties"""
        self._models.clear()
        # without resetting Base, we get errors that the same table is declared twice...
        # extend_existing = True was not enough :-(
        self.Base = declarative_base(name='Base')

    def __reset_if_needed(self):
        """Resets all models after the database changed"""
        # this is backed by a memcached value which gets reset when the ETL finishes
        # needed as we do not have direct access to this object when we run the ETL
        last_reset = get_last_database_change_time() 
        if last_reset != self._last_reset:
            self.reset()
            self._last_reset = last_reset

    @property
    def Model1(self):
        self.__reset_if_needed()
        if 'Model1' not in self._models:
            class Model1(self.Base):
                __tablename__ = 'table_name'
                __table_args__ = dict(schema='schema_name', autoload=True, autoload_with=some_engine,
                                      extend_existing=True)
                id = Column(Text, primary_key=True)
            self._models['Model1'] = Model1
        return self._models['Model1']


    def __declare_main_models(self):
        """Helper to declare the main models in one go due to dependency issues"""
        class Model2(self.Base):
            # model with dependency on/relationship with Model3
        class Model3(self.Base):
            # model with dependency on/relationship with Model2

        self._models['Model2'] = Model2
        self._models['Model3'] = Model3


    @property
    def Model2(self):
        self.__reset_if_needed()
        if 'Model2' not in self._models:
            self.__declare_main_models()
        return self._models['Model2']
models = Models()

Use like:

from app.database import models as m
p = m.Model1(...)

For some models, I had to declare them in one go as they had relationships with each other and it was a pain to sort that out.

For a 'read only' case this works nicely in a gunicorn with multiple workers (=processes). It's probably not safe to use in a multi-threading app...

Upvotes: 1

Related Questions