Danosaure
Danosaure

Reputation: 3655

sqlalchemy: union query few columns from multiple tables with condition

I'm trying to adapt some part of a MySQLdb application to sqlalchemy in declarative base. I'm only beginning with sqlalchemy.

The legacy tables are defined something like:

student: id_number*, semester*, stateid, condition, ...
choice: id_number*, semester*, choice_id, school, program, ...

We have 3 tables for each of them (student_tmp, student_year, student_summer, choice_tmp, choice_year, choice_summer), so each pair (_tmp, _year, _summer) contains information for a specific moment.

select *
from `student_tmp`
    inner join `choice_tmp` using (`id_number`, `semester`)

My problem is the information that is important to me is to get the equivalent of the following select:

SELECT t.*
FROM (
        (
            SELECT st.*, ct.*
            FROM `student_tmp` AS st
                INNER JOIN `choice_tmp` as ct USING (`id_number`, `semester`)
            WHERE (ct.`choice_id` = IF(right(ct.`semester`, 1)='1', '3', '4'))
                AND (st.`condition` = 'A')
        ) UNION (
            SELECT sy.*, cy.*
            FROM `student_year` AS sy
                INNER JOIN `choice_year` as cy USING (`id_number`, `semester`)
            WHERE (cy.`choice_id` = 4)
                AND (sy.`condition` = 'A')
        ) UNION (
            SELECT ss.*, cs.*
            FROM `student_summer` AS ss
                INNER JOIN `choice_summer` as cs USING (`id_number`, `semester`)
            WHERE (cs.`choice_id` = 3)
                AND (ss.`condition` = 'A')
        )
    ) as t

* used for shorten the select, but I'm actually only querying for about 7 columns out of the 50 availables.

This information is used in many flavors... "Do I have new students? Do I still have all students from a given date? Which students are subscribed after the given date? etc..." The result of this select statement is to be saved in another database.

Would it be possible for me to achieve this with a single view-like class? The information is read-only so I don't need to be able to modify/create/delte. Or do I have to declare a class for each table (ending up with 6 classes) and every time I need to query, I have to remember to filter?

Thanks for pointers.

EDIT: I don't have modification access to the database (I cannot create a view). Both databases may not be on the same server (so I cannot create a view on my second DB).

My concern is to avoid the full table scan before filtering on condition and choice_id.

EDIT 2: I've set up declarative classes like this:

class BaseStudent(object):
    id_number = sqlalchemy.Column(sqlalchemy.String(7), primary_key=True)
    semester = sqlalchemy.Column(sqlalchemy.String(5), primary_key=True)
    unique_id_number = sqlalchemy.Column(sqlalchemy.String(7))
    stateid = sqlalchemy.Column(sqlalchemy.String(12))
    condition = sqlalchemy.Column(sqlalchemy.String(3))

class Student(BaseStudent, Base):
    __tablename__ = 'student'

    choices = orm.relationship('Choice', backref='student')

#class StudentYear(BaseStudent, Base):...
#class StudentSummer(BaseStudent, Base):...

class BaseChoice(object):
    id_number = sqlalchemy.Column(sqlalchemy.String(7), primary_key=True)
    semester = sqlalchemy.Column(sqlalchemy.String(5), primary_key=True)
    choice_id = sqlalchemy.Column(sqlalchemy.String(1))
    school = sqlalchemy.Column(sqlalchemy.String(2))
    program = sqlalchemy.Column(sqlalchemy.String(5))


class Choice(BaseChoice, Base):
    __tablename__ = 'choice'

    __table_args__ = (
            sqlalchemy.ForeignKeyConstraint(['id_number', 'semester',],
                [Student.id_number, Student.semester,]),
            )

#class ChoiceYear(BaseChoice, Base): ...
#class ChoiceSummer(BaseChoice, Base): ...

Now, the query that gives me correct SQL for one set of table is:

q = session.query(StudentYear, ChoiceYear) \
            .select_from(StudentYear) \
            .join(ChoiceYear) \
            .filter(StudentYear.condition=='A') \
            .filter(ChoiceYear.choice_id=='4')

but it throws an exception...

"Could not locate column in row for column '%s'" % key)
sqlalchemy.exc.NoSuchColumnError: "Could not locate column in row for column '*'"

How do I use that query to create myself a class I can use?

Upvotes: 2

Views: 4627

Answers (2)

Danosaure
Danosaure

Reputation: 3655

This is to show what I ended up doing, any comment welcomed.


class JoinedYear(Base):
    __table__ = sqlalchemy.select(
            [
                StudentYear.id_number,
                StudentYear.semester,
                StudentYear.stateid,
                ChoiceYear.school,
                ChoiceYear.program,
                ],
                from_obj=StudentYear.__table__.join(ChoiceYear.__table__),
                ) \
                .where(StudentYear.condition == 'A') \
                .where(ChoiceYear.choice_id == '4') \
                .alias('YearView')

and I will elaborate from there...

Thanks @van

Upvotes: 1

van
van

Reputation: 76992

If you can create this view on the database, then you simply map the view as if it was a table. See Reflecting Views.

# DB VIEW
CREATE VIEW my_view AS -- @todo: your select statements here

# SA
my_view = Table('my_view', metadata, autoload=True)
# define view object
class ViewObject(object):
    def __repr__(self):
        return "ViewObject %s" % str((self.id_number, self.semester,))
# map the view to the object
view_mapper = mapper(ViewObject, my_view)

# query the view
q = session.query(ViewObject)
for _ in q:
    print _

If you cannot create a VIEW on the database level, you could create a selectable and map the ViewObject to it. The code below should give you the idea:

student_tmp = Table('student_tmp', metadata, autoload=True)
choice_tmp = Table('choice_tmp', metadata, autoload=True)
# your SELECT part with the columns you need
qry = select([student_tmp.c.id_number, student_tmp.c.semester, student_tmp.stateid, choice_tmp.school])
# your INNER JOIN condition
qry = qry.where(student_tmp.c.id_number == choice_tmp.c.id_number).where(student_tmp.c.semester == choice_tmp.c.semester)
# other WHERE clauses
qry = qry.where(student_tmp.c.condition == 'A')

You can create 3 queries like this, then combine them with union_all and use the resulting query in the mapper:

view_mapper = mapper(ViewObject, my_combined_qry)

In both cases you have to ensure though that a PrimaryKey is properly defined on the view, and you might need to override the autoloaded view, and specify the primary key explicitely (see the link above). Otherwise you will either receive an error, or might not get proper results from the query.


Answer to EDIT-2:

qry = (session.query(StudentYear, ChoiceYear).
        select_from(StudentYear).
        join(ChoiceYear).
        filter(StudentYear.condition == 'A').
        filter(ChoiceYear.choice_id == '4')
        )

The result will be tuple pairs: (Student, Choice).
But if you want to create a new mapped class for the query, then you can create a selectable as the sample above:

student_tmp = StudentTmp.__table__
choice_tmp = ChoiceTmp.__table__
.... (see sample code above)

Upvotes: 4

Related Questions