Reputation: 2266
This may be a stupid question but I must ask since I see it a lot... I have inherited quite a few packages in which developers will use the the Data Conversion transformation shape when dumping flat files into their respective sql server tables. This is pretty straight forward however I always wonder why wouldn't the developer just specify the correct data types within the flat file connection and then do a straight load into the the table?
For example:
Typically I will see flat file connections with columns that are DT_STR
and then converted into the correct type within the package ie: DT_STR
of length 50 to DT_I4
. However, if the staging table and the flat file are based on the same schema - why wouldn't you just specify the correct types (DT_I4
) in the flat file connection? Is there any added benefit (performance, error handling) for using the data conversion task that I am not aware of?
Upvotes: 3
Views: 2571
Reputation: 4477
This is a good question with not one right answer. Here is the strategy that I use:
If the data source is unreliable
i.e. sometimes int or date values are strings, like when you have the literal word 'null' instead of the value being blank. I would let the data source be treated as strings and deal with converting the data downstream.
This could mean just staging the data in a table and using the database to do conversions and loading from there. This pattern avoid the source component throwing errors which is always tricky to troubleshoot. Also, it avoids having to add error handling into data conversion components.
Instead, if the database throws a conversion error, you can easily look at the data in your staging table to examine the problem. Lastly, SQL is much more forgiving with date conversions than ssis.
If the data source is reliable
If the dates and numbers are always dates and numbers, I would define the datatypes in the connection manager. This makes it clear what you are expecting from the file and makes the package easier to maintain with fewer components.
Additionally, if you go to the advanced properties of the flatfile source, integers and dates can be set to fast parse which will speed up the read time: https://msdn.microsoft.com/en-us/library/8893ea9d-634c-4309-b52c-6337222dcb39?f=255&MSPPError=-2147217396
When I use data conversion
I rarely use the data conversion component. But one case I find it useful is for converting from / to unicode. This could be necessary when reading from an ado.net source which always treats the input as unicode, for example.
Upvotes: 4
Reputation: 4610
You could change the output data type in the flat file connection manager in Advanced
page or right click the source in Data flow
, Advanced editor
to change the data type before loading it.
I think one of the benefit is the conversion
transformation could allow you output the extra column, usually named copy of ..
, which in some case, you might use both of the two columns. Also, sometimes when you load the data from Excel source
, all coming with Unicode
, you need to use Data conversion
to do the data TF, etc.
Also, just FYI, you could also use Derived Column
TF to convert the data type.
UPDATE [Need to be further confirmed]:
From the flat file source connection manager
, the maximum length of string type is 255, while in the Data Conversion
it could be set over 255.
Upvotes: 2