Ryan
Ryan

Reputation: 323

Order By sorts differently than Sort Transformation in SSIS (Integration services)

Using SSIS I've encountered issues when joining due to sorting using two OLE DB Sources.

I've discovered that if I use an order by vs. a sort transformation it produces different results that causes a merge join to product incorrect data

Example:

Table1

Id int PK
JoinKey varchar(25)
OriginalValue varchar(25)

Table2

Id int PK
JoinKey varchar(25)
ExpectedValue varchar(25)

The Program Flow in SSIS is as follows:

  1. OLE Data source selecting from Table1
  2. Sort Transformation on Table1 by JoinKey asc then ExpectedValue asc
  3. OLE Data source selecting from Table2 Order By JoinKey asc
  4. Merge Join Transformation (left outer join) with Table1 and Table2 on JoinKey selecting OriginalValue and ExpectedValue

In doing this - I end up with a null value for ExpectedValue, for some of my data set. Doing the query directly in SQL returns the values expected.

If I add a Sort Transformation before step 5, it correctly joins the ExpectedValue however then SSIS has the warning "Validation warning. The data is already sorted as specified so the transform can be removed."

Two questions:

  1. Shouldn't using an order by and a sort transformation produce the same results?
  2. Why does SSIS indicate a warning when the sort is actually needed?

I've browse some related posts but they seem to focus on efficiency not differences in functionality.

  1. Is it better to sort data at the application layer, or with an order by clause?
  2. why we need to use sort transformation before merge join transformation in ssis
  3. SQL Server Integration Service Left Join

Upvotes: 4

Views: 4033

Answers (2)

Ryan
Ryan

Reputation: 323

Found a Microsoft post indicating the reason. Marking Tab Alleman's response as the answer as it is what led me to finding this.

The Sort transformation uses Windows collation to sort string values.

You cannot use the ORDER BY clause alone because the ORDER BY clause uses a SQL Server collation to sort string values. The use of the SQL Server collation might result in a different sort order than Windows collation, which can cause the Merge or Merge Join transformation to produce unexpected results.

https://msdn.microsoft.com/en-us/library/ms137653.aspx

Upvotes: 1

Tab Alleman
Tab Alleman

Reputation: 31785

I have experienced this in the past and the answer is NO, sorting with an ORDER BY in your Source SQL does not always produce the same results as using a Sort Transformation in your Dataflow task. There is a difference in the way certain special (non alpha-numeric) characters (and maybe NULLS) are handled; I can't remember exactly which. I don't recall finding any documentation about this (it was years ago), but I confirmed it through my own testing.

My conclusion was, to do a JOIN in the dataflow, make sure you use the same method of sorting on both sides of the JOIN.

As for your second question, SSIS has an IsSorted property on a datasource. If you set the IsSorted property on the DataSource to true, then if you try to do a Sort Transformation, all it knows is that IsSorted is true and it will give the warning that sorting is not needed. It doesn't know that the Sort Transformation is "needed" in order to match another source that used a Sort Transformation.

If you want to use Sort Transformations on both sides of the JOIN, set the IsSorted property of your source to false.

Upvotes: 3

Related Questions