Reputation: 12576
I'm using CsvHelper v26.1.0 to read the following text file delimited by ~
:
123~John
234~Joe "Public"
But the double quotes in the file is causing CsvHelper to treat them as bad data. I tested it by removing the double quotes and it worked fine. But the question is, I already set a custom delimiter, why is the double quote still causing the problem?
public class AccountDtoMap : ClassMap<AccountDto>
{
public AccountDtoMap()
{
Map(m => m.Number).Index(0);
Map(m => m.Name).Index(1);
}
}
var cfg = new CsvHelper.Configuration.CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = "~",
HasHeaderRecord = false,
MissingFieldFound = (context) => { errs.Add($"{typeof(T)} missing field: {context.Context.Parser.RawRecord}"); },
BadDataFound = (context) => { errs.Add($"{typeof(T)} bad data: {context.RawRecord}"); },
};
using (var csv = new CsvReader(new StreamReader(file), cfg))
{
csv.Context.RegisterClassMap<AccountDtoMap>();
return csv.GetRecords<T>().ToList();
}
Runnable demo here.
Upvotes: 4
Views: 5656
Reputation: 116795
To parse the CSV shown in your question (in version 26.1.0), you need to properly configure all of the following CsvConfiguration
settings, not just the delimiter:
Delimiter
. The character used to delimit fields in a single CSV line. (Usually ,
, here ~
).
Escape
, default "
. The character used to precede some other character that needs escaping.
Quote
, default "
. The character used to wrap a field that needs quotes at the beginning and end as per RFC4180.
The function of the first three character settings above is explained in the comments for the CsvMode
enum:
public enum CsvMode
{
/// Uses RFC 4180 format (default).
/// If a field contains a CsvConfiguration.Delimiter or CsvConfiguration.NewLine,
/// it is wrapped in CsvConfiguration.Quote's.
/// If quoted field contains a CsvConfiguration.Quote, it is preceded by CsvConfiguration.Escape.
RFC4180 = 0,
/// Uses escapes.
/// If a field contains a CsvConfiguration.Delimiter, CsvConfiguration.NewLine,
/// or CsvConfiguration.Escape, it is preceded by CsvConfiguration.Escape.
/// Newline defaults to \n.
Escape,
/// <summary>
/// Doesn't use quotes or escapes.
/// This will ignore quoting and escape characters. This means a field cannot contain a
/// CsvConfiguration.Delimiter, CsvConfiguration.Quote, or
/// CsvConfiguration.NewLine, as they cannot be escaped.
NoEscape
}
The field Joe "Public"
contains embedded escape characters that are not themselves escaped, which is causing CshHelper to report an error. In order to avoid the error, you have several possible options including:
Set CsvMode.NoEscape
to completely disable escaping and quoting:
var cfg = new CsvHelper.Configuration.CsvConfiguration(CultureInfo.InvariantCulture)
{
Mode = CsvMode.NoEscape,
// Remainder unchanged.
Of course, if you do this, your CSV file cannot contain delimiters or newlines embedded in fields.
Demo fiddle #1 here.
Set Mode = CsvMode.Escape
to disable wrapping of fields in quotes, and set Escape
to some other character such as \
or \t
that you do not expect to encounter in the file in practice:
var cfg = new CsvHelper.Configuration.CsvConfiguration(CultureInfo.InvariantCulture)
{
Mode = CsvMode.Escape,
Escape = '\\',
// Remainder unchanged.
Even if you do this, delimiters, escape and newline characters inside CSV fields must still be properly escaped using the selected escape character.
Demo fiddle #2 here.
Set Mode = CsvMode.Escape
and fix your file to properly escape the escape characters:
234~Joe ""Public""
Demo fiddle #3 here.
Upvotes: 8