Reputation: 609
I have a text file which for example the first row has 3 columns, the second row has 5 columns. When I try get this into a flat file source in SSIS it only sees the 3 columns when the row below it has 5 columns. Is there a way to show the maximum total of columns a given row may have, even as much as 10 columns for one particular row?
Upvotes: 0
Views: 233
Reputation: 96572
If you have any control over the data provider, please ask them to change how they provide the data so that it is easier to process. For instance, a client may well be willing to change the data especially if you let them know that the extra cost to develop a solution for the bad format will be charged to them. However, if teh file is one that is provided to multiple people like one picked up from a government agency, the chances of getting it changed are small. In any event, it is best to at least try to push this back and let them know it will take longer and cost significantly more to create a method for handling badly formed files.
I once had to process a file like this (it was a government file provided to thousands of different users) and the first field told me what type of data and thus how many columns and which columns it had.
I imported it into a staging table with one column. Then I separated out the data into separate staging tables based on the type of data as determined by the first column. You could use the number of delimiters I guess if your first column has no meaning to help you separate it. Even if they have teh same meaning, you woudl need to write separate SQl for each transform to account for what goes into the fields not provided in the record.
If it is only the first record that is off, sometimes that is a header record showing such things as datetime and number of records produced. In that case you can do a data flow that skips this record.
Upvotes: 1
Reputation: 81960
Another option is it import the text file as a non-delimited string, and then parse as needed.
Create some Dummy Data
DECLARE @ImportTable TABLE(ImportString varchar(max));
INSERT INTO @ImportTable VALUES
('25,18'),
('15,218,25,689,267,268'),
('615,518,55')
The SQL
Select B.*
From @ImportTable A
Cross Apply (Select * from [dbo].[udf-Str-Parse-Row](A.ImportString,',')) B
Returns
Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9 Pos10
25 18 NULL NULL NULL NULL NULL NULL NULL NULL
15 218 25 689 267 268 NULL NULL NULL NULL
615 518 55 NULL NULL NULL NULL NULL NULL NULL
The UDF to Parse the Row
CREATE FUNCTION [dbo].[udf-Str-Parse-Row] (@String varchar(max),@Delimeter varchar(10))
--Usage: Select * from [dbo].[udf-Str-Parse-Row]('Dog,Cat,House,Car',',')
-- Select * from [dbo].[udf-Str-Parse-Row]('John Cappelletti',' ')
-- Select * from [dbo].[udf-Str-Parse-Row]('id26,id46|id658,id967','|')
Returns Table
As
Return (
SELECT Pos1 = xDim.value('/x[1]','varchar(250)')
,Pos2 = xDim.value('/x[2]','varchar(250)')
,Pos3 = xDim.value('/x[3]','varchar(250)')
,Pos4 = xDim.value('/x[4]','varchar(250)')
,Pos5 = xDim.value('/x[5]','varchar(250)')
,Pos6 = xDim.value('/x[6]','varchar(250)')
,Pos7 = xDim.value('/x[7]','varchar(250)')
,Pos8 = xDim.value('/x[8]','varchar(250)')
,Pos9 = xDim.value('/x[9]','varchar(250)')
,Pos10 = xDim.value('/x[10]','varchar(250)')
FROM (Select Cast('<x>' + Replace(@String,@Delimeter,'</x><x>')+'</x>' as XML) as xDim) A
)
I added Pos10, but you can see that you can expand as necessary. Also, you should note that you should probably cast each PosX to an appropriate datatype (i.e. SomeFildName=cast(Pos1 as int)
Upvotes: 0