Reputation: 83
Googled for some tips on how to crack this. But did not get any helpful hits. Now, I wonder if I can achieve the same in SSIS or not.
There are multiple .csv files in a folder. What I am trying to achieve is to:
and re-start the entire above process until all the .csv files in the folder are processed.
Initially I thought I can use the For Each Loop Container and Execute process Task to achieve this. However, not able to find any resource as to how to achieve the above desired objective.
Example: Header of every Source .csv file:
CODE | NAME | Value 1 | Value 2 | Value 3 | DATE | QTY | PRICE | VALUE_ADD | ZONE
I need to delete columns: NAME | VALUE_ADD | ZONE from each file and re-arrange the columns in the below order.
Desired column order:
CODE | DATE| Value 1 | Value 2 | Value 3 | PRICE | QTY
I know this is possible within SSIS. But am not able to figure it out. Thanks for your help in advance.
Upvotes: 1
Views: 1886
Reputation: 83
After a lot of experimenting, managed to get the desired result. In the end, it seemed so simple.
My main motive for creating this package was that I had a lot of .csv files that needed the laborious task of opening each file and running a macro that eliminated a couple of columns, rearranged the remaining columns in the desired format. Then I had to manually save each of the files after clicking on the Excel Confirmation boxes. That was becoming too much. I wanted just a one click approach.
Giving a detailed way of what I did. Hope it helps people who are tying to get data from multiple .csv files as source, then get only the desired columns in the order they need, and finally save the desired output as .csv files into a new destination.
In brief, all I had to use was use:
And within the Data Flow Task:
Also, had to use 3 Variables - all String Data Types with Project Scope - which I named: CurrFileName, DestFilePath, and FolderPath.
.
Detailed Steps:
Set default values to the variables:
CurrFileName: Just provide the name of one of the .scv files (test.csv) for temporary purpose.
FolderPath: Provide the path where your source .csv files are located (C:\SSIS\Data\Input)
DestFilePath: Provide the Destination path where you want to save the processed files (C:\SSIS\Data\Input\Output)
Step 1: Drag a For Each Loop Container to the Control Flow area.
Step 2: In collection, select the enumerator as 'Foreach File Enumerator'.
Step 3: Under Enumerator Configuration, under Folder: provide the folder path where the .csv files are located (In my case, C:\SSIS\Data\Input) and in Files:, provide the extension (in our case: *.csv)
Step 4: Under Retrieve file name, select 'Name and extension' radio button.
Step 5: Then go to the Variable Mappings section and select the Variable (in my case: User::CurrFileName.
Step 6: Create the source connection (let's call it SrcConnection)- right-click in the Connection Managers area and select the Flat File Connection manager and select one of the .csv files (for temporary purpose). Go to the Advanced tab and provide the correct desired data type for the columns you wish to keep. Click OK to exit.
Step 7: Then go to the Properties of this newly created source Flat File Connection and click the small box adjacent to the Expressions field to open the Property Expressions Editor. under 'Property', select 'ConnectionString' and in the Expression space, enter: @[User::FolderPath] + "\" + @[User::CurrFileName] and click OK to exit.
Step 8: In Windows Explorer, create a new folder inside your Source folder (in our case: C:\SSIS\Data\Input\Output)
Step 9: Create the Destination connection (let's call it DestConnection) - right-click in the Connection Managers area and select the Flat File Connection manager and select one of the .csv files (for temporary purpose). Go to the Advanced tab and provide the correct desired data type for the columns you wish to keep. Click OK to exit.
Step 10: Then go to the Properties of this newly created source Flat File Connection and click the small box adjacent to the Expressions field to open the Property Expressions Editor. under 'Property', select 'ConnectionString' and in the Expression space, enter: @[User::DestFilePath] + @[User::CurrFileName] and click OK to exit.
Step 11: Drag the Data Flow Task to the Foreach Loop Container.
Step 12: In the Data Flow Task, drag a Flat File Source and in the Flat file connection manager: select the source connection (in this case: SrcConnection). In Columns, de-select all the columns and select only the columns that you require (in the order that you require) and click OK to exit.
Step 13: Drag a Flat File Destination to the Data Flow Task and in the Flat File Connection manager: select the destination connection (in this case: DestConnection). Then, go to the Mappings section and verify if the mappings are as per desired output. Click OK to exit.
Step 14: That's it. Execute the package. it should execute without any trouble.
Hope this helped :-)
Upvotes: 1
Reputation: 16240
It isn't clear why you want to use SSIS to do this: your task seems to be to manipulate text files outside the database, and it's usually much easier to do this in a small script or program written in a language with good CSV parsing support (Perl, Python, PowerShell, whatever). If this should be part of a larger package then you can simply call the script using an Execute Process task. SSIS is a great tool, but I find it quite awkward for a task like this.
Upvotes: 0
Reputation: 8113
Easily done using the following four steps :
Et voilà!
Upvotes: 1