Reputation: 1776
I have a problem in my SQL Query
i want to select DISTINCT StudentGroups(SG) but the query gives me some repetations
here is my Query
SELECT DISTINCT(SG.SGID), en.EnrollmentID, CR.Name AS Course, INS.Name as Instructor,
S.Session, SG.StartTime, SG.EndTime, EN.CreateDate
FROM StudentGroups SG inner JOIN Enrollments EN ON SG.SGID = EN.SGID
JOIN Courses CR ON SG.CourseID = CR.CourseID
JOIN Class CL ON SG.ClassID = CL.ClassID
JOIN Instructors INS ON SG.InstructorID = INS.InstructorID
JOIN Sessions S ON SG.SessionID = S.SessionID
WHERE EN.SGID NOT IN ( SELECT SGID FROM Enrollments
WHERE StudentID = 45
UPDATED
This query gives me following data
but i don't want repeated SGID
Upvotes: 11
Views: 53126
Reputation: 1530
Please use GROUP BY clause when DISTINCT is not working for your desired output.
Upvotes: 6
Reputation:
DISTINCT
always applies to all columns returned. Putting parentheses around one column makes no difference in its behavior.
If you want your results to contain only unique values of SG.GID
, you can use a GROUP BY
clause instead--but then you have to decide some rules for which values you want to return in the other columns. You do this by using aggregate functions like MIN()
,MAX()
,COUNT()
,SUM()
, etc. Simplified example:
SELECT SG.SGID,
MIN(SG.START_TIME), --the lowest start time for this sgid.
COUNT(DISTINCT en.EnrollmentID) --the unique enrollments for this sgid.
FROM StudentGroups SG
INNER JOIN Enrollments EN ON SG.SGID = EN.SGID
GROUP BY SG.SGID;
When joining multiple tables as in your original query, you have to be careful when counting and summing things, as duplicates from the join can give you incorrect results.
Another option would be to use ROW_NUMBER()
to return one row for each SGID
:
SELECT * FROM (
SELECT SG.SGID,
SG.START_TIME,
en.EnrollmentID,
ROW_NUMBER() OVER (PARTITION BY SGID ORDER BY SG.START_TIME) as RN
FROM StudentGroups SG
INNER JOIN Enrollments EN ON SG.SGID = EN.SGID
)
WHERE RN = 1;
This numbers the rows for each SGID
, starting at 1 and sorted by the value of SG.START_TIME
. It will return the one row with the earliest start time for each SGID
. If multiple rows have the same start time, it will select any one of those, more or less at random. You could add more fields to the ORDER BY
clause to further define which rows are returned.
Upvotes: 16