user1694775
user1694775

Reputation:

TextFieldParser and invalid rows

In our company, we receive distribution feeds from vendors who propose CSV files for us. However, they are unable to escape the quotation characters in their text fields which cause several lines to be ignored; using Text Field Parser.

An example of bad line:

"CABLES TO GO","87029","5.0200","47","757120870296","87029","WP SGL ALUM 1 1/2" GROMMET"

The corresponding code snippet is:

private static IEnumerable<string> ParseHelper(String line, int lineRead, Encoding enc)
{
    MemoryStream mem = new MemoryStream(enc.GetBytes(line));
    TextFieldParser readerTemp = new TextFieldParser(mem, enc) {CommentTokens = new[] {"#"}};
    readerTemp.SetDelimiters(new[] { "," });
    readerTemp.HasFieldsEnclosedInQuotes = true;
    readerTemp.TextFieldType = FieldType.Delimited;
    readerTemp.TrimWhiteSpace = true;
    try
    {
        var items = readerTemp.ReadFields();
        return items;
    }
    catch (MalformedLineException ex)
    {
        throw new MalformedLineException(String.Format(
            "Line {0} is not valid and will be skipped: {1}\r\n{2}",
            lineRead, readerTemp.ErrorLine, ex));
    }
}

Also, this vendor is unable to change the source file to escape these quotes. What is the best workaround for these lines like this?

Upvotes: 0

Views: 463

Answers (1)

Sam Axe
Sam Axe

Reputation: 33738

There is no work-around.

The CSV spec allows unescaped quotation marks to encapsulate field values. If they are handing you files with unescaped quotation marks within the field value you have a problem.

These are not CSV files (they violate the spec and are thus not what you think they are).

If you insist on attempting to parse them as CSV anyways, you can begin by escaping all unescaped quotation marks that are not proceeded by a record terminator or field delimiter.

This approach will only go so far. Sometimes corrupted data just can't be uncorrupted.

Upvotes: 1

Related Questions