Reputation: 16143
To read a CSV file, I use the following statement:
var query = from line in rawLines
let data = line.Split(';')
select new
{
col01 = data[0],
col02 = data[1],
col03 = data[2]
};
The CSV file I want to read is malformed in the way, that an entry can have the separator ;
itself as data when surrounded with qutation marks.
Example:
col01;col02;col03
data01;"data02;";data03
My read statement above does not work here, since it interprets the second row as four columns.
Question: Is there an easy way to handle this malformed CSV correctly? Perhaps with another LINQ query?
Upvotes: 3
Views: 3632
Reputation: 1142
Parsing CSV files manually can always lead to issues like this. I would advise that you use a third party tool like CsvHelper to handle the parsing.
Furthermore, it's not a good idea to explicitly parse commas, as your separator can be overridden in your computers environment options.
Let me know if I can help further,
Matt
Upvotes: 1
Reputation: 1038830
Just use a CSV parser and STOP ROLLING YOUR OWN:
using (var parser = new TextFieldParser("test.csv"))
{
parser.CommentTokens = new string[] { "#" };
parser.SetDelimiters(new string[] { ";" });
parser.HasFieldsEnclosedInQuotes = true;
// Skip over header line.
parser.ReadLine();
while (!parser.EndOfData)
{
string[] fields = parser.ReadFields();
Console.WriteLine("{0} {1} {2}", fields[0], fields[1], fields[2]);
}
}
TextFieldParser
is built in .NET. Just add reference to the Microsoft.VisualBasic
assembly and you are good to go. A real CSV parser will happily handle this situation.
Upvotes: 12
Reputation: 21855
Not very elegant but after using your method you can check if any colxx contains an unfinished quotation mark (single) you can join it with the next colxx.
Upvotes: 0