abs786123
abs786123

Reputation: 609

text files with variable columns

I have a text file which for example the first row has 3 columns, the second row has 5 columns. When I try get this into a flat file source in SSIS it only sees the 3 columns when the row below it has 5 columns. Is there a way to show the maximum total of columns a given row may have, even as much as 10 columns for one particular row?

Upvotes: 0

Views: 233

Answers (2)

HLGEM
HLGEM

Reputation: 96572

If you have any control over the data provider, please ask them to change how they provide the data so that it is easier to process. For instance, a client may well be willing to change the data especially if you let them know that the extra cost to develop a solution for the bad format will be charged to them. However, if teh file is one that is provided to multiple people like one picked up from a government agency, the chances of getting it changed are small. In any event, it is best to at least try to push this back and let them know it will take longer and cost significantly more to create a method for handling badly formed files.

I once had to process a file like this (it was a government file provided to thousands of different users) and the first field told me what type of data and thus how many columns and which columns it had.

I imported it into a staging table with one column. Then I separated out the data into separate staging tables based on the type of data as determined by the first column. You could use the number of delimiters I guess if your first column has no meaning to help you separate it. Even if they have teh same meaning, you woudl need to write separate SQl for each transform to account for what goes into the fields not provided in the record.

If it is only the first record that is off, sometimes that is a header record showing such things as datetime and number of records produced. In that case you can do a data flow that skips this record.

Upvotes: 1

John Cappelletti
John Cappelletti

Reputation: 81960

Another option is it import the text file as a non-delimited string, and then parse as needed.

Create some Dummy Data

DECLARE @ImportTable TABLE(ImportString varchar(max));
INSERT INTO @ImportTable VALUES
('25,18'),
('15,218,25,689,267,268'),
('615,518,55')

The SQL

Select B.*
 From  @ImportTable A
 Cross Apply (Select * from [dbo].[udf-Str-Parse-Row](A.ImportString,',')) B

Returns

Pos1    Pos2    Pos3    Pos4    Pos5    Pos6    Pos7    Pos8    Pos9    Pos10
25      18      NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
15      218     25      689     267     268     NULL    NULL    NULL    NULL
615     518     55      NULL    NULL    NULL    NULL    NULL    NULL    NULL

The UDF to Parse the Row

CREATE FUNCTION [dbo].[udf-Str-Parse-Row] (@String varchar(max),@Delimeter varchar(10))
--Usage: Select * from [dbo].[udf-Str-Parse-Row]('Dog,Cat,House,Car',',')
--       Select * from [dbo].[udf-Str-Parse-Row]('John Cappelletti',' ')
--       Select * from [dbo].[udf-Str-Parse-Row]('id26,id46|id658,id967','|')

Returns Table 

As

Return (
    SELECT Pos1 = xDim.value('/x[1]','varchar(250)')
          ,Pos2 = xDim.value('/x[2]','varchar(250)')
          ,Pos3 = xDim.value('/x[3]','varchar(250)')
          ,Pos4 = xDim.value('/x[4]','varchar(250)')
          ,Pos5 = xDim.value('/x[5]','varchar(250)')
          ,Pos6 = xDim.value('/x[6]','varchar(250)')
          ,Pos7 = xDim.value('/x[7]','varchar(250)')
          ,Pos8 = xDim.value('/x[8]','varchar(250)')
          ,Pos9 = xDim.value('/x[9]','varchar(250)')
          ,Pos10 = xDim.value('/x[10]','varchar(250)')
    FROM (Select Cast('<x>' + Replace(@String,@Delimeter,'</x><x>')+'</x>' as XML) as xDim) A
)

I added Pos10, but you can see that you can expand as necessary. Also, you should note that you should probably cast each PosX to an appropriate datatype (i.e. SomeFildName=cast(Pos1 as int)

Upvotes: 0

Related Questions