gcoleman0828
gcoleman0828

Reputation: 1542

CSV parser to parse double quotes via OLEDB

How can I use OLEDB to parse and import a CSV file that each cell is encased in double quotes because some rows contain commas in them?? I am unable to change the format as it is coming from a vendor.

I am trying the following and it is failing with an IO error:

public DataTable ConvertToDataTable(string fileToImport, string fileDestination)
{
    string fullImportPath = fileDestination + @"\" + fileToImport;
    OleDbDataAdapter dAdapter = null;
    DataTable dTable = null;

    try
    {
        if (!File.Exists(fullImportPath))
            return null;

        string full = Path.GetFullPath(fullImportPath);
        string file = Path.GetFileName(full);
        string dir = Path.GetDirectoryName(full);


        //create the "database" connection string
        string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
          + "Data Source=\"" + dir + "\\\";"
          + "Extended Properties=\"text;HDR=No;FMT=Delimited\"";

        //create the database query
        string query = "SELECT * FROM " + file;

        //create a DataTable to hold the query results
        dTable = new DataTable();

        //create an OleDbDataAdapter to execute the query
        dAdapter = new OleDbDataAdapter(query, connString);


        //fill the DataTable
        dAdapter.Fill(dTable);
    }
    catch (Exception ex)
    {
        throw new Exception(CLASS_NAME + ".ConvertToDataTable: Caught Exception: " + ex);
    }
    finally
    {
        if (dAdapter != null)
            dAdapter.Dispose();
    }

    return dTable;
}

When I use a normal CSV it works fine. Do I need to change something in the connString??

Upvotes: 0

Views: 6281

Answers (7)

Muhammad Mubashir
Muhammad Mubashir

Reputation: 1657

 private static void Mubashir_CSVParser(string s)
        {
            // extract the fields
            Regex RegexCSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
            String[] Fields = RegexCSVParser.Split(s);

            // clean up the fields (remove " and leading spaces)
            for (int i = 0; i < Fields.Length; i++)
            {
                Fields[i] = Fields[i].TrimStart(' ', '"');
                Fields[i] = Fields[i].TrimEnd('"');// this line remove the quotes
                //Fields[i] = Fields[i].Trim();
            }
        }

Upvotes: 0

Kim Gentes
Kim Gentes

Reputation: 1628

There is a lot to consider when handling CSV files. However you extract them from the file, you should know how you are handling the parsing. There are classes out there that can get you part way, but most don't handle the nuances that Excel does with embedded commas, quotes and line breaks. However, loading Excel or the MS classes seems a lot of freaking overhead if you just want parse a txt file like a CSV.

One thing you can consider is doing the parsing in your own Regex, which will also make your code a little more platform independent, in case you need to port it to another server or application at some point. Using regex has the benefit of also being accessible in virtually every language. That said, there are some good regex patterns out there that handle the CSV puzzle. Here is my shot at it, which does cover embedded commas, quotes and line breaks. Regex code/pattern and explanation :

http://www.kimgentes.com/worshiptech-web-tools-page/2008/10/14/regex-pattern-for-parsing-csv-files-with-embedded-commas-dou.html

Hope that is of some help..

Upvotes: 1

Vishal Sen
Vishal Sen

Reputation: 1175

You can use this code : MS office required

  private void ConvertCSVtoExcel(string filePath = @"E:\nucc_taxonomy_140.csv", string tableName = "TempTaxonomyCodes")
    {
        string tempPath = System.IO.Path.GetDirectoryName(filePath);
        string strConn = @"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + tempPath + @"\;Extensions=asc,csv,tab,txt";
        OdbcConnection conn = new OdbcConnection(strConn);
        OdbcDataAdapter da = new OdbcDataAdapter("Select * from " + System.IO.Path.GetFileName(filePath), conn);
        DataTable dt = new DataTable();
        da.Fill(dt);

        using (SqlBulkCopy bulkCopy = new SqlBulkCopy(ConfigurationSettings.AppSettings["dbConnectionString"]))
        {
            bulkCopy.DestinationTableName = tableName;
            bulkCopy.BatchSize = 50;
            bulkCopy.WriteToServer(dt);
        }

    }

Upvotes: 1

gcoleman0828
gcoleman0828

Reputation: 1542

Just incase anyone has a similar issue, i wanted to post the code i used. i did end up using Textparser to get the file and parse ot the columns, but i am using recrusion to get the rest done and substrings.

 /// <summary>
        /// Parses each string passed as a "row".
        /// This routine accounts for both double quotes
        /// as well as commas currently, but can be added to
        /// </summary>
        /// <param name="row"> string or row to be parsed</param>
        /// <returns></returns>
        private List<String> ParseRowToList(String row)
        {
            List<String> returnValue = new List<String>();

            if (row[0] == '\"')
            {// Quoted String
                if (row.IndexOf("\",") > -1)
                {// There are more columns
                    returnValue = ParseRowToList(row.Substring(row.IndexOf("\",") + 2));
                    returnValue.Insert(0, row.Substring(1, row.IndexOf("\",") - 1));
                }
                else
                {// This is the last column
                    returnValue.Add(row.Substring(1, row.Length - 2));
                }
            }
            else
            {// Unquoted String
                if (row.IndexOf(",") > -1)
                {// There are more columns
                    returnValue = ParseRowToList(row.Substring(row.IndexOf(",") + 1));
                    returnValue.Insert(0, row.Substring(0, row.IndexOf(",")));
                }
                else
                {// This is the last column
                    returnValue.Add(row.Substring(0, row.Length));
                }
            }

            return returnValue;

        }

Then the code for Textparser is:

 // string pathFile = @"C:\TestFTP\TestCatalog.txt";
            string pathFile = @"C:\TestFTP\SomeFile.csv";

            List<String> stringList = new List<String>();
            TextFieldParser fieldParser = null;
            DataTable dtable = new DataTable();

            /* Set up TextFieldParser
                *  use the correct delimiter provided
                *  and path */
            fieldParser = new TextFieldParser(pathFile);
            /* Set that there are quotes in the file for fields and or column names */
            fieldParser.HasFieldsEnclosedInQuotes = true;

            /* delimiter by default to be used first */
            fieldParser.SetDelimiters(new string[] { "," });

            // Build Full table to be imported
            dtable = BuildDataTable(fieldParser, dtable);

Upvotes: -1

Oded
Oded

Reputation: 499002

Use a dedicated CSV parser.

There are many out there. A popular one is FileHelpers, though there is one hidden in the Microsoft.VisualBasic.FileIO namespace - TextFieldParser.

Upvotes: 3

Joel Coehoorn
Joel Coehoorn

Reputation: 415790

Try the code from my answer here:

Reading CSV files in C#

It handles quoted csv just fine.

Upvotes: 0

Robert Harvey
Robert Harvey

Reputation: 180788

Have a look at FileHelpers.

Upvotes: 1

Related Questions