dramixx
dramixx

Reputation: 274

Import Flat File containing multiline fields in SSIS

I would like to import a flat file *.csv in SSIS. But one field is a multiline text. I do not have special record delimiter (and there is no way to get one), which is therefore the carriage return \r\n or CRLF.

The problem is : when SSIS meets a CRLF in a multiline field, he passes to the next line instead of continuing as the multiline field.

Here is the header and some first lines :

"name", "firstname", "description", "age"
"John", "Smith", "blablablablablabla", 25
"Fred", "Gordon", "blablabla
blablablabla", 33
"Bill", "Buffalo", "bllllllllllllaaaaaaa
blaaaaaaa
blaalalalaaaaaaaaaa", 44

This example above contains 1 header and 3 records. SSIS understands it as 1 header and 6 records and then get errors, of course.

I don't know how can i handle that problem.

Hope you should help me.

Upvotes: 1

Views: 5279

Answers (2)

user3662215
user3662215

Reputation: 526

Given that the source of the text files cannot be contacted and that the number of columns in each csv will vary, the best option for performing an import is to proceed on a variation of option 2 of Answer #1. This will require some customization and the application of a script task in the control flow.

  • On the server where the SSIS package will be running, create a bucket folder where a temporary text file will be saved. Each time a CSV file is processed, a temporary file called "destFile.csv" will be created from it and this is what you will import. Each time a different csv file is processed by the script task, it will save to this temporary file and location.
  • Create two variables in the SSIS package. One for the source file and the second for the destination file.
  • Create a script task and define the two variables being sent to it.
  • Add the following C# to the script task and remember to replace at the top the assignments for source File and destination File. They should be set equal to the new user variables just created.

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using System.Threading.Tasks;
    using System.Diagnostics;
    using System.IO;
    using System.Data;

    namespace ConsoleApplication1 { class Program { static void Main(string[] args) { string sourceFile = @"C:\test\tempfile.csv"; string line; int count = 0; int commaCount = 0; int HeaderCommaCount = 0; string templine; string destinationFile = @"C:\test\destFile.csv"; List lines = new List();

    // Delete temporary destination file if it already exists if (File.Exists(destinationFile)) { File.Delete(destinationFile); } // Create temporary destination file File.Create(destinationFile).Dispose(); if (File.Exists(sourceFile)) { StreamReader file = null; try { file = new StreamReader(sourceFile); while ((line = file.ReadLine()) != null) { // If Header line, get the number of commas. This is the base by which all following rows will be compared. if (count == 0) { HeaderCommaCount = line.Split(',').Length - 1; lines.Add(line); //save to a string array count++; } else // This is any row following header row { commaCount = line.Split(',').Length - 1; if (commaCount == HeaderCommaCount) //Row following header contains the correct number of columns { lines.Add(line); //save to a string array count++; } else { templine = line; // If comma count is less than that of Header row, continue reading rows until it does and then write. while (commaCount != HeaderCommaCount) { line = file.ReadLine(); templine = templine + " " + line; commaCount = templine.Split(',').Length - 1; line = templine; if (commaCount == HeaderCommaCount) { lines.Add(line); //save to a string array } } } } } } finally { if (file != null) file.Close(); } } File.WriteAllLines(destinationFile, lines); //send contents of string array to destination file. //Console.ReadLine(); } }

    }

I wrote this quickly as a console application so that it would be easier to convert over to a C# script task. The file tested successfully where I applied your initial file example. It will iterate through the source text file and concatenate the lines together that have been split apart and then save to a destination file. The destination file is recreated and populated each time it is run. You can test this out first as a console application in Visual Studio and also apply a console.writeline(line) command just above or below where you see the lines.Add(line) in the code.

After this, all you need to do is import from the temporary destination file to your database.

Hope this helps.

Upvotes: 1

user3662215
user3662215

Reputation: 526

According to your example, the Description field values can contain multiple carriage returns that is causing the creation of new lines.

The following record appearing on multiple lines...

"Bill", "Buffalo", "bllllllllllllaaaaaaa 
blaaaaaaa
blaalalalaaaaaaaaaa", 44

should appear like that below for SSIS to see the expected number of columns.

"Bill", "Buffalo", "bllllllllllllaaaaaaa blaaaaaaa blaalalalaaaaaaaaaa", 44

There are a couple of approaches to resolving the formatting issue.

  1. If possible, the easiest approach is to follow up with the person who created the file and have them do it correctly. For example, assuming they're using SQL Server, then they can apply the following in their TSQL statement for the description field to replace the carriage returns with a blank. (Oracle also has a similar function.)

    REPLACE(Description, CHAR(13),' ')

If you need to replace a line feed, then use CHAR(10).

  1. Otherwise, I understand that contacting the source of the file is not always possible. In this case, you can modify the text file programmatically before feeding it into SSIS. The following link discusses how to apply Excel to do this where you can then save to a new csv file and then import that through SSIS.

http://www.mrexcel.com/forum/excel-questions/304939-importing-text-data-carriage-returns-into-excel.html

If you are looking at setting up the SSIS package in a job, then you can write a script task in the early part of your control flow that will do the same thing and bypass Excel. The VB code provided in the link can be easily adapted to a script task.

Hope this helps.

Upvotes: 1

Related Questions