Martin James
Martin James

Reputation: 146

Unexpected result with ORDER BY

I have the following query:

SELECT    
    D.[Year] AS [Year]
    , D.[Month] AS [Month]
    , CASE
        WHEN f.Dept IN ('XSD') THEN 'Marketing'
        ELSE f.Dept 
    END AS DeptS 
    , COUNT(DISTINCT f.OrderNo) AS CountOrders
FROM Sales.LocalOrders AS l WITH 
INNER JOIN Sales.FiscalOrders AS f 
    ON l.ORDER_NUMBER = f.OrderNo
INNER JOIN Dimensions.Date_Dim AS D 
    ON CAST(D.[Date] AS DATE) = CAST(f.OrderDate AS DATE)
WHERE YEAR(f.OrderDate) = 2019
AND f.Dept IN ('XSD', 'PPM', 'XPP')
GROUP BY 
    D.[Year]
    , D.[Month]
    , f.Dept 
ORDER BY 
    D.[Year] ASC
    , D.[Month] ASC

I get the following result the ORDER BY isn't giving the right result with Month column as we can see it is not ordered:

Year    Month   Depts       CountOrders
2019    1       XSD         200
2019    10      PPM         290
2019    10      XPP         150
2019    2       XSD         200
2019    3       XPP         300

The expected output:

Year    Month   Depts       CountOrders
2019    1       XSD         200
2019    2       XSD         200
2019    3       XPP         300
2019    10      PPM         290
2019    10      XPP         150

Upvotes: 2

Views: 314

Answers (5)

Gordon Linoff
Gordon Linoff

Reputation: 1269563

The simplest solution is:

ORDER BY MIN(D.DATE)

or:

ORDER BY MIN(f.ORDER_DATE)

Fiddling with the year and month columns is totally unnecessary when you have a date column that is available.

Upvotes: 1

ed2
ed2

Reputation: 1497

Your query

It is ordered by month, as your D.[Month] is treated like a text string in the ORDER BY clause.

You could do one of two things to fix this:

  • Use a two-digit month number (e.g. 01... 12)
  • Use a data type for the ORDER BY clause that will be recognized as representing a month

A quick fix

You can correct this in your code by quickly changing the ORDER BY clause to analyze those columns as though they are numbers, which is done by converting ("casting") them to an integer data type like this:

ORDER BY 
CAST(D.[Year] AS INT) ASC
,CAST(D.[Month] AS INT) ASC

This will correct your unexpected query results, but does not address the root cause, which is your underlying data (more on that below).

Your underlying data

The root cause of your issue is how your underlying data is stored and/or surfaced.

Your Month seems to be appearing as a default data type (VarChar), rather than something more specifically suited to a month or date.

If you administer or have access to or control over the database, it is a good idea to consider correcting this.

In considering this, be mindful of potential context and change management issues, including:

  • Is this underlying data, or just a representation of upstream data that is elsewhere? (e.g. something that is refreshed periodically using a process that you do not control, or a view that is redefined periodically)
  • What other queries or processes rely on how this data is currently stored or surfaced (including data types), that may break if you mess with it?
  • Might there be validation issues if correcting it? (such as from the way zero, null, non-numeric or non-date data is stored, even if invalid)
  • What change management practices should be followed in your environment?
  • Is the data source under high transactional load?
  • Is it a production dataset?
  • Are other reporting processes dependent on it?

None of these issues are a good excuse to leave something set up incorrectly forever, which will likely compound the issue and introduce others. However, that is only part of the story.

The appropriate approach (correct it, or leave it) will depend on your situation. In a perfect textbook world, you'd correct it. In your world, you will have to decide.

A better way?

The above solution is a bit of a quick and nasty way to force your query to work.

The fact that the solution CASTs late in the query syntax, after the results have been selected and filtered, hints that is not the most elegant way to achieve this.

Ideally you can convert data types as early as possible in the process:

  • If done in underlying data, not the query, this is the ultimate but may not suit the situation (see below)
  • If done in the query, try to do it earlier.

In your case, your GROUP BY and ORDER BY are both using columns that look to be redundant data from the original query results, that is, you are getting a DATE and a MONTH and a YEAR. Ideally you would just get a DATE and then use the MONTH or YEAR from that date. Your issue is your dates are not actually dates (see "underlying data" above), which:

  • In the case of DATE, is converted in your INNER JOIN line ON CAST(D.[Date] AS DATE) = CAST(f.OrderDate AS DATE) (likely to minimise issues with the join)
  • In the case of D.[year] and D.[month], are not converted (which is why we still need to convert them further down, in ORDER BY)

You could consider ignoring D.[month] and use the MONTH DATEPART computed from DATE, which would avoid the need to use CAST in the ORDER BY clause.

In your instance, this approach is a middle ground. The quick fix is included at the top of this answer, and the best fix is to correct the underlying data. This last section considers optimizing the quick fix, but does not correct the underlying issue. It is only mentioned for awareness and to avoid promoting the use of CAST in an ORDER BY clause as the most legitimate way of addressing your issue with good clean query syntax.

There are also potential performance tradeoffs between how many columns you select that you don't need (e.g. all of the ones in D?), whether to compute the month from the date or a seperate month column, whether to cast to date before filtering, etc. These are beyond the scope of this solution.

So:

  • The immediate solution: use the quick fix
  • The optimal solution: after it's working, consider the underlying data (in your situation)

Upvotes: 5

Thom A
Thom A

Reputation: 95554

The real problem is your object Dimensions.Date_Dim here. As you are simply ordering on the value of D.[Year] and D.[Month] without manipulating the values at all, this means the object is severely flawed; you are storing numerical data as a varchar. varchar, and numerical data types are completely different. For example 2 is less than 10 but '2' is greater than '10'; because '2' is greater than '1', so therefore it must also be greater than '10'.

The real solution, therefore, is fixing your object. Assuming that both Month and Year are incorrectly stored as a varchar, don't have any non-integer values (another and different flaw if so), and not a computed column then you could just do:

ALTER TABLE Dimensions.Date_Dim ALTER COLUMN [Year] int NOT NULL;
ALTER TABLE Dimensions.Date_Dim ALTER COLUMN [Month] int NOT NULL;

You could, however, also make the columns a PERSISTED computed column, which might well be easier, in my opinion, as DATEPART already returns a strongly typed int value.

ALTER TABLE dbo.Date_Dim DROP COLUMN [Month];
ALTER TABLE dbo.Date_Dim ADD [Month] AS DATEPART(MONTH,[Date]) PERSISTED;

Of course, for both solutions, you'll need to (first) DROP and (afterwards) reCREATE any indexes and constraints on the columns.

Upvotes: 2

Amira Bedhiafi
Amira Bedhiafi

Reputation: 1

A very common issue when you store numerical data as a varchar/nvarchar. Try to cast Year and Month to INT.

ORDER BY 
CAST(D.[Year] AS INT) ASC
,CAST(D.[Month] AS INT) ASC

If you try using the <, > and BETWEEN operators, you will get some really "weird" results.

Upvotes: 0

jigga
jigga

Reputation: 614

As long as your "Month" is always 1-12, you can use

SELECT ..., TRY_CAST(D.[Month] AS INT) AS [Month],...
ORDER BY TRY_CAST(D.[Month] AS INT) 

Upvotes: 1

Related Questions