Reputation: 8963
I'm trying to perform a per-field merge between entities in my SQLAlchemy datasource and entities that are pulled in from somewhere else (foreign REST API). What I want to do is something like this:
class Person:
__tablename__ = "people"
id = Column(Integer,primary_key=True)
name = Column(String)
nameDatestamp = Column(DateTime)
address = Column(String)
addressDatestamp = Column(DateTime)
def merge(myPerson, foreignPerson):
if myPerson.nameDateStamp < foreignPerson.nameDateStamp:
myPerson.name = foreignPerson.name
myPerson.nameDateStamp = foreignPerson.nameDateStamp
if myPerson.addressDatestamp < foreignPerson.addressDateStamp:
myPerson.addressDatestamp = foreignPerson.addressDateStamp
myPerson.address = foreignPerson.address
For lots of classes and lots of fields. This seems too verbose to be best-practice.
I could introduce new data models e.g. DateStampedString (which would consist of a datestamp and a string), DateStampedRelationship, etc., but I am concerned that the additional indirection of using multiple tables would contribute to the speed of the merge
I could use Python decorators with arguments that would dynamically add the additional datestamp columns to my model at runtime and before the table is created e.g.
@datestamp(name,address)
class Person:
...
Maybe I could somehow exploit sqlalchemy.types.TypeDecorator
, to build a new datatype, but it seems to be interested in going from (opaque type)->(sqlalchemy type) rather than bundling two types together.
Is there a best practice for what I am trying to do?
EDIT: I'm looking for
Upvotes: 1
Views: 897
Reputation: 17076
You can easily iterate over (thing
, datestamp
) through the query object. For example, if you wanted to get address and datestamp you could do:
session.query(Person.address, Person.addressDatestamp).all()
which would return a set of (address
, addressDatestamp
) tuples. (they are actually namedtuples, but you can just use the index instead). You don't actually need to do this if you have a bunch of attributes you want to update. One way you could do this dynamically is to pass merge a list of attribute tuples and a query that is a tuple of (Person, foreignPerson) and do the following:
attrs = [("address", "addressDatestamp"), ("name", "nameDatestamp")]
person_tuples = # some way to generate (Person, ForeignPerson) tuples
def merge(attrs, person_tuples):
for person, foreign in person_tuples:
for attr, date in attrs:
if getattr(person, date) < getattr(foreign, date):
setattr(person, attr) = getattr(foreign, attr)
setattr(person, date) = getattr(foreign, date)
return person_tuples
This checks each attribute's datestamp, and then stores the attribute if the foreign is more recent (+ also stores the date).
if you attributes are always in the form <attr>
and <attr>Datestamp
then you could shorten this to:
attrs = ["name", "address"]
def merge(attrs, person_tuples):
for person, foreign in person_tuples:
for attr in attrs:
date = attr + "Datestamp"
if getattr(person, date) < getattr(foreign, date):
setattr(person, attr) = getattr(foreign, attr)
setattr(person, date) = setattr(foreign, date)
If the attribute may sometimes not be present, you can change the getattr call to getattr(object, attr, default)
and it won't raise an error.
If you want to be able to dynamically generate models with datestamps, you could use a metaclass (a bit more complicated, particularly because it interferes with SQLA's declarative base, etc) or you could create a class factory, like the following:
def datestamped_factory(class_name, attrlist, timestamp="Datestamp", superclass_list=None):
superclass_list = superclass or (object,)
cols = dict((attr, Column(String)) for attr in attrlist)
cols.update(dict((attr + timestamp, Column(DateTime)) for attr in attrlist)
cols["timestamped_attrs"] = attrlist
# create a merge specific to the class (so only need to pass person_tuples)
cols["merge"] = classmethod(lambda cls, person_tuples: merge(cls.timestamped_attrs, person_tuples))
return type(class_name, superclass_list, cols)
(can just add that to the factory with cols["class_merge"] = classmethod(lambda cls, person_tuples: merge(cls.timestamped_attrs, person_tuples
)))
To create your person method, you could just do something like the following:
class Base(sqlalchemy.declarative_base()):
id = Column(Integer, primary_key=True)
Person = datestamped__factory("Person", ["name", "address"], superclass_list = (Base,))
Person.__tablename__ = "person"
(replace sqlalchemy.declarative_base()
with whatever Base class you are using...assuming you're using the ORM).
You could get fancier and write up a metaclass that finds all the columns that are strings and adds datestamps to them + creates appropriate merge and create methods that update timestamps appropriately, but that's probably more fancy than you need to get.
Upvotes: 3