Reputation: 83
I want to get the distinct value of a particular column however duplicity is not properly managed if more than 3 columns are selected.
The query is:
SELECT DISTINCT
ShoppingSessionId, userid
FROM
dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId, userid
HAVING
userid = 7
This query produces correct result, but if we add another column then result is wrong.
Please help me as I want to use the ShoppingSessionId
as a distinct, except when I want to use all the columns from the table, including with the where
clause .
How can I do that?
Upvotes: 0
Views: 5024
Reputation: 77707
The DISTINCT
keyword applies to the entire row, never to a column.
Presently DISTINCT
is not needed at all, because your script already makes sure that ShoppingSession
is distinct: by specifying the column in GROUP BY
and filtering on the other grouping column (userid
).
When you add a third column to GROUP BY
and it results in duplicated ShoppingSession
, it means that some ShoppingSession
values are associated with many different values of the added column.
If you want ShoppingSession
to remain distinct after including that third column, you should decide which values of the the added column should be left in the output and which should be discarded. This is called aggregating. You could apply the MAX()
function to that column, or MIN()
or any other suitable aggregate function. Note that the column should not be included in GROUP BY
in this case.
Here's an illustration of what I'm talking about:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId,
userid
HAVING userid = 7
There's one more note on your query. The HAVING
clause is typically used for filtering on aggregated columns. If your filter does not involve aggregated columns, you'll be better off using the WHERE
clause instead:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId,
userid
Although both queries would produce identical results, their efficiency would be different, because the first query would have to pull all rows, group/aggregate them, then discard all rows except userid = 7
, but the second one would discard rows first and only then group/aggregate the remaining, which is much more efficient.
You could go even further and exclude the userid
column from GROUP BY
and pull its value with an aggregate function:
SELECT
ShoppingSessionId,
MAX(userid) AS userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId
Since all userid
values in your output are supposed to contain 7
(because that's in your filter), you can just pick a maximum value per every ShoppingSession
, knowing that it'll always be 7
.
Upvotes: 5