Reputation: 81
I have two versions of a subquery using SQLAlchemy ORM:
subq1 = session.query(su.DistCode,dr.RtrId, su.RtrCode, su.InvoiceNo, su.SlabId, sh.SchId, sslab.PurQty, sslab.DiscPer, sslab.FlatAmt).\
join(sh).\
join(dr, and_(dr.DistCode==su.DistCode, dr.RtrCode==su.RtrCode)).\
join(sslab,and_(su.SlabId==sslab.SlabId, sh.SchId==sslab.SchId)).\
subquery()
and the other being:
subq1 = session.query(su.DistCode,dr.RtrId, su.RtrCode, su.InvoiceNo, su.SlabId, sh.SchId).\
join(sh).\
join(dr, and_(dr.DistCode==su.DistCode, dr.RtrCode==su.RtrCode)).\
subquery()
The only difference between the two is the inclusion of a join with:
.join(sslab,and_(su.SlabId==sslab.SlabId, sh.SchId==sslab.SchId))
I use both versions with the following code, one after the other.
subq2 = session.query(ds.DistCode, ds.RtrId, ds.PrdCde, ds.SalInvDte, ds.SalInvNo,
(ds.SalInvQty*ds.SelRateBeforTax).label('SBT'), ds.SalInvSch,
pdet.ProductId, dr.RtrChannelCode, dr.GeoName, dr.RtrClassCode, dr.RtrCode,
dr.RtrGroupCode).join(pdet).outerjoin(dr, and_(ds.DistCode==dr.DistCode, ds.RtrId==dr.RtrId)).subquery()
rset = session.query(subq2.c.DistCode, subq2.c.RtrId, subq2.c.RtrCode, subq2.c.SalInvNo,
subq2.c.SalInvDte, subq2.c.PrdCde, subq2.c.ProductId, subq2.c.SBT, subq2.c.SalInvSch,
subq2.c.RtrChannelCode, subq2.c.RtrClassCode, subq2.c.RtrGroupCode,
subq2.c.GeoName, subq1.c.SlabId, subq1.c.SchId).join(subq1,and_(subq1.c.DistCode==subq2.c.DistCode, subq1.c.RtrId==subq2.c.RtrId, subq1.c.InvoiceNo==subq2.c.SalInvNo)).join(spid,and_(subq2.c.ProductId==spid.ProductID, subq1.c.SchId==spid.SchemeID))
df = pd.read_sql(rset.statement, rset.session.bind)
The results gobsmack me. The first query gets into an infinite loop (or something that takes 10 hours+; while the other takes all of 26 seconds!
As a way of debugging through this issue, I ran both versions of 'subq1' as independent queries and both work absolutely fine - running in less than 3 seconds.
Any ideas on how I can get to the bottom of this issue?
Upvotes: 0
Views: 368
Reputation: 81
In today's world of quantum computing and self driven cars, I would expect multiple joins to be a simple problem. Turns out it is.
My colleague suggested this answer and it worked. I was missing out on group_by. group_by on subquery apparently reduces processing time significantly.
All I had to do was alter subq1 as:
subq1 = session.query(su.DistCode,dr.RtrId, su.RtrCode, su.InvoiceNo, su.SlabId,
sh.SchId, sslab.PurQty.label('PQ'),
sslab.DiscPer.label('DP'),
sslab.FlatAmt.label('FA')).join(sh).join(dr, and_(dr.DistCode==su.DistCode, dr.RtrCode==su.RtrCode)).join(sslab,and_(su.SlabId==sslab.SlabId, sh.SchId==sslab.SchId)).group_by(su.DistCode,dr.RtrId, su.RtrCode, su.InvoiceNo, su.SlabId, sh.SchId, sslab.DiscPer, sslab.FlatAmt, sslab.PurQty).subquery()
Notice the group_by at the end. Worked like a charm. runs in less than a minute.
Upvotes: 1