Hemant Sharma
Hemant Sharma

Reputation: 83

How do I use distinct for a column along with a where clause in sql server 2008?

I want to get the distinct value of a particular column however duplicity is not properly managed if more than 3 columns are selected.

The query is:

SELECT DISTINCT 
  ShoppingSessionId, userid 
FROM 
  dbo.tbl_ShoppingCart 
GROUP BY 
  ShoppingSessionId, userid 
HAVING
  userid = 7

This query produces correct result, but if we add another column then result is wrong.

Please help me as I want to use the ShoppingSessionId as a distinct, except when I want to use all the columns from the table, including with the where clause .

How can I do that?

Upvotes: 0

Views: 5024

Answers (1)

Andriy M
Andriy M

Reputation: 77707

The DISTINCT keyword applies to the entire row, never to a column.

Presently DISTINCT is not needed at all, because your script already makes sure that ShoppingSession is distinct: by specifying the column in GROUP BY and filtering on the other grouping column (userid).

When you add a third column to GROUP BY and it results in duplicated ShoppingSession, it means that some ShoppingSession values are associated with many different values of the added column.

If you want ShoppingSession to remain distinct after including that third column, you should decide which values of the the added column should be left in the output and which should be discarded. This is called aggregating. You could apply the MAX() function to that column, or MIN() or any other suitable aggregate function. Note that the column should not be included in GROUP BY in this case.

Here's an illustration of what I'm talking about:

SELECT
  ShoppingSessionId,
  userid,
  MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
GROUP BY
  ShoppingSessionId,
  userid
HAVING userid = 7

There's one more note on your query. The HAVING clause is typically used for filtering on aggregated columns. If your filter does not involve aggregated columns, you'll be better off using the WHERE clause instead:

SELECT
  ShoppingSessionId,
  userid,
  MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
  ShoppingSessionId,
  userid

Although both queries would produce identical results, their efficiency would be different, because the first query would have to pull all rows, group/aggregate them, then discard all rows except userid = 7, but the second one would discard rows first and only then group/aggregate the remaining, which is much more efficient.

You could go even further and exclude the userid column from GROUP BY and pull its value with an aggregate function:

SELECT
  ShoppingSessionId,
  MAX(userid) AS userid,
  MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
  ShoppingSessionId

Since all userid values in your output are supposed to contain 7 (because that's in your filter), you can just pick a maximum value per every ShoppingSession, knowing that it'll always be 7.

Upvotes: 5

Related Questions