MilkTea
MilkTea

Reputation: 73

SSIS Combining CSV files

Im kinda new to all the SSIS stuff. And im stuck with it. i want to combine multiple CSV files and then put them into a database. The files all have the same info. Examples:

File 1

   Week         Text1
 22-10-2018      58
 29-10-2018      12

File 2

   Week         Text2
 22-10-2018      55
 29-10-2018      48

File 3

   Week         Text3
 22-10-2018      14
 29-10-2018      99

Expected result: Result in DB

   Week         Text1    Text2    Text3
 22-10-2018      58        55       14
 29-10-2018      12        48       99

I got this far by selecting the documents, use a sort and then a join merge. For 3 documents this took me 3 sorts and 2 join merge's. I have to do this for about 86 documents.. there has to be an easier way.

Thanks in advance.

Upvotes: 0

Views: 687

Answers (2)

Nick.Mc
Nick.Mc

Reputation: 19184

I agree with KeithL, I recommend that your final table look like this:

Week        Outcome    Value      DateModified
=======================================================
22-10-2018     AI       58       2018-10-23 20:49
29-10-2018     AI       32       2018-10-23 20:49
22-10-2018    Agile     51       2018-10-23 20:49
29-10-2018    Agile     22       2018-10-23 20:49

If you want to pivot Weeks or outcomes, do it in your reporting tool.

Don't create tables with dynamic named columns - that's a bad idea

Anyway here is an approach that uses a staging table.

Create a staging table that your file will be inserted into:

Script 1:

 CREATE TABLE Staging (
      [Week]         VARCHAR(50),
      Value          VARCHAR(50),
      DateModified   DATETIME2(0) DEFAULT(GETDATE())
     )

Import the entire file in, including headings. In other words, when defining the file format, don't tick 'columns in first row'

We do this for two reasons:

  1. SSIS can't import files with with different heading names using the same data flow
  2. We need to capture the heading name in our staging table

After you import a file your staging table looks like this:

Week          Value    DateModified   
=======================================
Week          Agile    2018-10-23 20:49    
22-10-2018      58     2018-10-23 20:49
29-10-2018      32     2018-10-23 20:49

Now select out the data in the shape we want to load it in. Run this in your database after importing the data to check:

Script 2:

SELECT Week, Value,
(SELECT TOP 1 Value FROM Staging WHERE Week = 'Week') Outcome
FROM staging 
WHERE Week <> 'Week'

Now add an INSERT and some logic to stop duplicates. Put this into an execute SQL task after the data import

Script 3:

WITH SRC As (
SELECT Week, Value,
(SELECT TOP 1 Value FROM Staging WHERE Week = 'Week') Outcome
FROM staging As SRC
WHERE Week <> 'Week'
)

INSERT INTO FinalTable (Week,Value, Outcome)
select  Week, Value, Outcome
FROM SRC
WHERE NOT EXISTS (
    SELECT * FROM FinalTable TGT 
    WHERE TGT.Week = SRC.Week 
    AND TGT.Outcome = SRC.Outcome
    )

Now you wrap this up in a for each file loop that repeats this for each file in the folder. Don't forget that you need to TRUNCATE TABLE staging before importing each file.

In Summary:

  1. Set up a for each file iterator
  2. Inside this goes:
    • A SQL Task with TRUNCATE TABLE Staging;
    • A data flow to import the text file from the iterator into the staging table
    • A SQL Task with Script 3 in it

I've put the DateModified columns in the tables to help you troubleshoot.

  • Good things: you can run this over and over and reimport the same file and you won't get duplicates
  • Bad thing: Possibility of cast failures when inserting VARCHAR into DATE or INT

Upvotes: 1

KeithL
KeithL

Reputation: 5594

You can read your file(s) using a simple C# script component (Source).

You need to add your 3 columns to output0.

  1. Week as DT_Date
  2. Type as DT_STR
  3. Value as DT_I4

    string[] lines = System.IO.File.ReadAllLines([filename]);
    int ctr = 0;
    string type;
    
    foreach(string line in lines)
    {
       string[] col = line.Split(',');
       if(ctr==0) //First line is header
       {
          type = col[1];
       }
       else
       {
          Output0Buffer.AddRow();
          Output0Buffer.Week = DateTime.Parse(col[0]);
          Output0Buffer.Type = type;
          Output0Buffer.Value = int.Parse(col[1]);
       }
       ctr++;
    }
    

After you load to a table you can always create a view with a dynamic pivot.

Upvotes: 1

Related Questions