Magnus
Magnus

Reputation: 760

Getting correct sums with multiple joins

I'm analyzing throughput data for my university. Right now I'm collecting background variables including qualifications (Grades of points on SAT equivalent) and results from different high school courses thought to be correlated with academic achievement.

I'm also supposed to calculate accumulated academic credits at the start of the program, which has proven to be somewhat of a hassle.

My Query is as follows:

SELECT PNR, ZIPCODE, SEL_CRITERION_ID, SR.QUAL_RATING, QUEUENUMBER, 
PROGRAM_ID, DUC.COURSE_ID, DUC.MARK, SR.COURSEOFFERING_ID, SR.ADMISSIONROUND_ID, RESULT, SR.WITHIN_PROGRAM , SUM(C.CREDIT)
FROM DATABASE.PERSON P 
LEFT JOIN DATABASE.PERSON_ADDRESS PA ON P.PERSON_ID=PA.PERSON_ID 
LEFT JOIN DATABASE.DIPLOMA DU ON P.PERSON_ID=DU.PERSON_ID 
LEFT JOIN DATABASE.DIPLOMA COURSE DUC ON DU.DIPLOMA_UPSEC_ID=DUC.DIPLOMA_UPSEC_ID 
LEFT JOIN DATABASE.SINGLE_COURSE_QUAL SUCQ ON SUCQ.PERSON_ID=P.PERSON_ID
LEFT JOIN DATABASE.PICKED_COURSE PCO ON P.PERSON_ID=PCO.PERSON_ID  
LEFT JOIN DATABASE.SEL_RESULT SR ON P.PERSON_ID=SR.PERSON_ID AND 

SR.COURSEOFFERING_ID=PCO.COURSEOFFERING_ID AND SR.ADMISSIONROUND_ID=PCO.ADMISSIONROUND_ID 
    LEFT JOIN DATABASE.COURSE_REG_QUAL CRQ ON SR.PERSON_ID=CRQ.PERSON_ID
    LEFT JOIN DATABASE.COURSE C ON CRQ.EDUCATIONORG_ID=C.EDUCATIONORG_ID AND CRQ.COURSE_ID=C.COURSE_ID
    WHERE SR.COURSEOFFERING_ID='97114' 
        AND SR.ADMISSIONROUND_ID='AT2018' 
        AND RESULT=20 
        AND     (
                DUC.COURSE_ID='SV1201' 
                OR DUC.COURSE_ID ='COURSE1'
                OR DUC.COURSE_ID ='COURSE2' 
                OR DUC.COURSE_ID ='COURSE3'
                OR DUC.COURSE_ID IS NULL)
    GROUP BY PNR, ZIPCODE, SEL_CRITERION_ID, SR.QUAL_RATING, QUEUENUMBER, 
    UPSEC_ID, DUC.COURSE_ID, DUC.MARK, SR.COURSEOFFERING_ID, SR.ADMISSIONROUND_ID, RESULT, SR.WITHIN_PROGRAM
    ORDER BY PNR

Since there are multiple possible values of qualification (sel_results), this generates a minimum of two rows per person. Since there are three possible values for course_ID, this makes for six rows in total.

Naturally then, the academic credits are summed up up to six times per person (though this is not consistent) which makes the calculation unreliable.

I would need some way to obtain one (correct) sum of academic credits per observation. I can work with having the same sum repeated six times in a row as long as the sum for each person is correct.

I have been looking at this answer, which reads like this:

,    <earlier columns>     ,    (   select SUM(payments.amount) 
         from payments 
         where payments.orderID = orders.id
     ) AS totalPayments ,    <later columns>

But I'm simply not good enough to generalize and I can't wrap my head around why you can use an address in the WHERE-statement which hasn't yet been specified in the FROM-statement. This doesn't seem to work with our system.

Upvotes: 0

Views: 31

Answers (1)

avery_larry
avery_larry

Reputation: 2135

The subquery idea untested

SELECT PNR, isnull(CR.CREDITS,0) AS CREDITS,
   ...
FROM DATABASE.PERSON P
LEFT OUTER JOIN (
   SELECT CRQ.PERSON_ID, SUM(C.CREDIT) AS CREDITS
   -- I'm not really sure how to associate a person with the credits
   -- but hopefully you'll get the idea here.
   FROM DATABASE.PICKED_COURSE PCO
   INNER JOIN DATABASE.COURSE_REG_QUAL CRQ on CRQ.COURSE_ID=C.COURSE_ID
   GROUP BY CRQ.PERSON_ID
) CR on CR.PERSON_ID = P.PERSON_ID
LEFT JOIN DATABASE.PERSON_ADDRESS PA ON P.PERSON_ID=PA.PERSON_ID 
LEFT JOIN DATABASE.DIPLOMA DU ON P.PERSON_ID=DU.PERSON_ID 
...

Upvotes: 1

Related Questions