Reputation: 69
As part of a data cleansing exercise I need to correct the formatting of a csv file.
Due to poor formatting/lack of quotes an extra comma in a description field is breaking my DTS package.
So, to get around this I have created a simple C# script to find any line in the csv that contains more columns than the header row.
When the row contains more columns than the header I want to merge array item [10] and [11] into one column and then write the line to my new file - keeping all the other existing columns as they are.
Code:
var columns = splitExpression.Split(line).Where(s => s != delimiter).ToArray();
if (headers == null) headers = new string[columns.Length];
if (columns.Length != headers.Length)
{
// TODO - Linq to write comma separated string but merge column 10 and 11 of the array
// writer.WriteLine(string.Join(delimiter, columns));
}
else
{
writer.WriteLine(string.Join(delimiter, columns));
}
Unfortunately, my Linq writing skills are somewhat lacking, can someone please help me fill in the TODO.
Upvotes: 2
Views: 372
Reputation: 236208
Simply use for columns list instead of array. That will allow you to remove unnecessary columns after merge:
var columns = splitExpression.Split(line).Where(s => s != delimiter).ToList();
if (headers == null) headers = new string[columns.Count];
if (columns.Count != headers.Length)
{
columns[10] = columns[10] + columns[11]; // combine columns here
columns.RemoveAt(11);
}
writer.WriteLine(string.Join(delimiter, columns));
Upvotes: 3