Jeff
Jeff

Reputation: 77

CSV file fix for double quotes within fields that are not always double quoted while inserting into SQL Server

I have a set of daily CSV files received from a 3rd party data source in which fields are not double-quoted unless they contain a comma, HOWEVER someone decided it was a good idea to include double quotes within a field sometimes.

The file contains records like this on occasion, which fails Bulk Inserting into SQL Server on the first row:

4458,123456,Danny "LEE" Smith,123456789122,Pizza Inc
4458,789012,John Baker,987654321,"Company Name, LLC"

I need to update the file programmatically to something like this so that SQL Server's Bulk Insert doesn't fail with row 2's double quotes remaining:

4458,123456,Danny 'LEE' Smith,123456789122,Pizza Inc
4458,789012,John Baker,987654321,"Company Name, LLC"

Is there a programmatic way with say Regex to make the change for the whole file, or will I just have to loop through each line looking for double quotes not preceded or post-ceded by commas?

Upvotes: 0

Views: 2259

Answers (3)

Jeff
Jeff

Reputation: 77

Thanks everyone for the help. I think the fastest solution for what I need in the time frame I have given various issues with a regex solution is to just read the file line by line and if I find a double quote that's not preceded by a comma or followed by a comma, then I'll replace it with a single quote, write the line to a new file, and continue through the file.

Upvotes: 0

Nick.Mc
Nick.Mc

Reputation: 19245

I used this site https://regex101.com/

To test a regex

for this file:

4458,123456,Danny "LEE" Smith,123456789122,Pizza Inc
4458,789012,John Baker,987654321,"Company Name, LLC"
4458,789012,"John,Baker",987654321,Company Name LLC

With this regex:

[^,](")[^[$|,|\n]

I managed to pick out just the quotes around "LEE"

So you can use any regex tool to find and replace those.

Since you mentioned C#, here's a code sample I adapted from here https://www.c-sharpcorner.com/article/c-sharp-regex-examples/

string badString = "<your CSV file>" ;  
string regex = "[^,](\")[^[$|,|\n]";
string CleanedString = Regex.ReplaceGroupValue(badString, regex ,1,"'");  

I also had to look up escaping strings in C#. This is untested but give it a try.

Upvotes: 2

Christoph
Christoph

Reputation: 3641

Maybe something like this:

private static void NormalizeCsv(string inputFile, string outputFile, Encoding encoding) {
    IEnumerable<string> inputLines = File.ReadLines(inputFile, encoding);
    using TextWriter outputLines = new StreamWriter(outputFile, append: false, encoding);
    outputLines.NewLine = "\r\n";
    foreach (string line in inputLines) {
        if (line.IndexOf('"') < 0) {
            outputLines.WriteLine(line);
        } else {
            string[] tokens = line.Split('"', Int32.MaxValue, StringSplitOptions.None);
            for (int i = 0; i < tokens.Length; i++) {
                string token = tokens[i];
                if ((i % 2) == 0) {
                    outputLines.Write(token);
                } else {
                    if (token.IndexOf(',') > -1) {
                        outputLines.Write("\"");
                        outputLines.Write(token);
                        outputLines.Write("\"");
                    } else {
                        outputLines.Write("'");
                        outputLines.Write(token);
                        outputLines.Write("'");
                    }
                }
            }
            outputLines.WriteLine();
        }
    }
}

Upvotes: 1

Related Questions