Reputation: 274
I would like to import a flat file *.csv in SSIS. But one field is a multiline text. I do not have special record delimiter (and there is no way to get one), which is therefore the carriage return \r\n or CRLF.
The problem is : when SSIS meets a CRLF in a multiline field, he passes to the next line instead of continuing as the multiline field.
Here is the header and some first lines :
"name", "firstname", "description", "age"
"John", "Smith", "blablablablablabla", 25
"Fred", "Gordon", "blablabla
blablablabla", 33
"Bill", "Buffalo", "bllllllllllllaaaaaaa
blaaaaaaa
blaalalalaaaaaaaaaa", 44
This example above contains 1 header and 3 records. SSIS understands it as 1 header and 6 records and then get errors, of course.
I don't know how can i handle that problem.
Hope you should help me.
Upvotes: 1
Views: 5279
Reputation: 526
Given that the source of the text files cannot be contacted and that the number of columns in each csv will vary, the best option for performing an import is to proceed on a variation of option 2 of Answer #1. This will require some customization and the application of a script task in the control flow.
Add the following C# to the script task and remember to replace at the top the assignments for source File and destination File. They should be set equal to the new user variables just created.
using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Diagnostics; using System.IO; using System.Data;namespace ConsoleApplication1 { class Program { static void Main(string[] args) { string sourceFile = @"C:\test\tempfile.csv"; string line; int count = 0; int commaCount = 0; int HeaderCommaCount = 0; string templine; string destinationFile = @"C:\test\destFile.csv"; List lines = new List();
// Delete temporary destination file if it already exists if (File.Exists(destinationFile)) { File.Delete(destinationFile); } // Create temporary destination file File.Create(destinationFile).Dispose(); if (File.Exists(sourceFile)) { StreamReader file = null; try { file = new StreamReader(sourceFile); while ((line = file.ReadLine()) != null) { // If Header line, get the number of commas. This is the base by which all following rows will be compared. if (count == 0) { HeaderCommaCount = line.Split(',').Length - 1; lines.Add(line); //save to a string array count++; } else // This is any row following header row { commaCount = line.Split(',').Length - 1; if (commaCount == HeaderCommaCount) //Row following header contains the correct number of columns { lines.Add(line); //save to a string array count++; } else { templine = line; // If comma count is less than that of Header row, continue reading rows until it does and then write. while (commaCount != HeaderCommaCount) { line = file.ReadLine(); templine = templine + " " + line; commaCount = templine.Split(',').Length - 1; line = templine; if (commaCount == HeaderCommaCount) { lines.Add(line); //save to a string array } } } } } } finally { if (file != null) file.Close(); } } File.WriteAllLines(destinationFile, lines); //send contents of string array to destination file. //Console.ReadLine(); } }
}
I wrote this quickly as a console application so that it would be easier to convert over to a C# script task. The file tested successfully where I applied your initial file example. It will iterate through the source text file and concatenate the lines together that have been split apart and then save to a destination file. The destination file is recreated and populated each time it is run. You can test this out first as a console application in Visual Studio and also apply a console.writeline(line) command just above or below where you see the lines.Add(line) in the code.
After this, all you need to do is import from the temporary destination file to your database.
Hope this helps.
Upvotes: 1
Reputation: 526
According to your example, the Description field values can contain multiple carriage returns that is causing the creation of new lines.
The following record appearing on multiple lines...
"Bill", "Buffalo", "bllllllllllllaaaaaaa
blaaaaaaa
blaalalalaaaaaaaaaa", 44
should appear like that below for SSIS to see the expected number of columns.
"Bill", "Buffalo", "bllllllllllllaaaaaaa blaaaaaaa blaalalalaaaaaaaaaa", 44
There are a couple of approaches to resolving the formatting issue.
If possible, the easiest approach is to follow up with the person who created the file and have them do it correctly. For example, assuming they're using SQL Server, then they can apply the following in their TSQL statement for the description field to replace the carriage returns with a blank. (Oracle also has a similar function.)
REPLACE(Description, CHAR(13),' ')
If you need to replace a line feed, then use CHAR(10).
If you are looking at setting up the SSIS package in a job, then you can write a script task in the early part of your control flow that will do the same thing and bypass Excel. The VB code provided in the link can be easily adapted to a script task.
Hope this helps.
Upvotes: 1