Reputation: 1542
How can I use OLEDB to parse and import a CSV file that each cell is encased in double quotes because some rows contain commas in them?? I am unable to change the format as it is coming from a vendor.
I am trying the following and it is failing with an IO error:
public DataTable ConvertToDataTable(string fileToImport, string fileDestination)
{
string fullImportPath = fileDestination + @"\" + fileToImport;
OleDbDataAdapter dAdapter = null;
DataTable dTable = null;
try
{
if (!File.Exists(fullImportPath))
return null;
string full = Path.GetFullPath(fullImportPath);
string file = Path.GetFileName(full);
string dir = Path.GetDirectoryName(full);
//create the "database" connection string
string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
+ "Data Source=\"" + dir + "\\\";"
+ "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
//create the database query
string query = "SELECT * FROM " + file;
//create a DataTable to hold the query results
dTable = new DataTable();
//create an OleDbDataAdapter to execute the query
dAdapter = new OleDbDataAdapter(query, connString);
//fill the DataTable
dAdapter.Fill(dTable);
}
catch (Exception ex)
{
throw new Exception(CLASS_NAME + ".ConvertToDataTable: Caught Exception: " + ex);
}
finally
{
if (dAdapter != null)
dAdapter.Dispose();
}
return dTable;
}
When I use a normal CSV it works fine. Do I need to change something in the connString??
Upvotes: 0
Views: 6281
Reputation: 1657
private static void Mubashir_CSVParser(string s)
{
// extract the fields
Regex RegexCSVParser = new Regex(",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))");
String[] Fields = RegexCSVParser.Split(s);
// clean up the fields (remove " and leading spaces)
for (int i = 0; i < Fields.Length; i++)
{
Fields[i] = Fields[i].TrimStart(' ', '"');
Fields[i] = Fields[i].TrimEnd('"');// this line remove the quotes
//Fields[i] = Fields[i].Trim();
}
}
Upvotes: 0
Reputation: 1628
There is a lot to consider when handling CSV files. However you extract them from the file, you should know how you are handling the parsing. There are classes out there that can get you part way, but most don't handle the nuances that Excel does with embedded commas, quotes and line breaks. However, loading Excel or the MS classes seems a lot of freaking overhead if you just want parse a txt file like a CSV.
One thing you can consider is doing the parsing in your own Regex, which will also make your code a little more platform independent, in case you need to port it to another server or application at some point. Using regex has the benefit of also being accessible in virtually every language. That said, there are some good regex patterns out there that handle the CSV puzzle. Here is my shot at it, which does cover embedded commas, quotes and line breaks. Regex code/pattern and explanation :
Hope that is of some help..
Upvotes: 1
Reputation: 1175
You can use this code : MS office required
private void ConvertCSVtoExcel(string filePath = @"E:\nucc_taxonomy_140.csv", string tableName = "TempTaxonomyCodes")
{
string tempPath = System.IO.Path.GetDirectoryName(filePath);
string strConn = @"Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" + tempPath + @"\;Extensions=asc,csv,tab,txt";
OdbcConnection conn = new OdbcConnection(strConn);
OdbcDataAdapter da = new OdbcDataAdapter("Select * from " + System.IO.Path.GetFileName(filePath), conn);
DataTable dt = new DataTable();
da.Fill(dt);
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(ConfigurationSettings.AppSettings["dbConnectionString"]))
{
bulkCopy.DestinationTableName = tableName;
bulkCopy.BatchSize = 50;
bulkCopy.WriteToServer(dt);
}
}
Upvotes: 1
Reputation: 1542
Just incase anyone has a similar issue, i wanted to post the code i used. i did end up using Textparser to get the file and parse ot the columns, but i am using recrusion to get the rest done and substrings.
/// <summary>
/// Parses each string passed as a "row".
/// This routine accounts for both double quotes
/// as well as commas currently, but can be added to
/// </summary>
/// <param name="row"> string or row to be parsed</param>
/// <returns></returns>
private List<String> ParseRowToList(String row)
{
List<String> returnValue = new List<String>();
if (row[0] == '\"')
{// Quoted String
if (row.IndexOf("\",") > -1)
{// There are more columns
returnValue = ParseRowToList(row.Substring(row.IndexOf("\",") + 2));
returnValue.Insert(0, row.Substring(1, row.IndexOf("\",") - 1));
}
else
{// This is the last column
returnValue.Add(row.Substring(1, row.Length - 2));
}
}
else
{// Unquoted String
if (row.IndexOf(",") > -1)
{// There are more columns
returnValue = ParseRowToList(row.Substring(row.IndexOf(",") + 1));
returnValue.Insert(0, row.Substring(0, row.IndexOf(",")));
}
else
{// This is the last column
returnValue.Add(row.Substring(0, row.Length));
}
}
return returnValue;
}
Then the code for Textparser is:
// string pathFile = @"C:\TestFTP\TestCatalog.txt";
string pathFile = @"C:\TestFTP\SomeFile.csv";
List<String> stringList = new List<String>();
TextFieldParser fieldParser = null;
DataTable dtable = new DataTable();
/* Set up TextFieldParser
* use the correct delimiter provided
* and path */
fieldParser = new TextFieldParser(pathFile);
/* Set that there are quotes in the file for fields and or column names */
fieldParser.HasFieldsEnclosedInQuotes = true;
/* delimiter by default to be used first */
fieldParser.SetDelimiters(new string[] { "," });
// Build Full table to be imported
dtable = BuildDataTable(fieldParser, dtable);
Upvotes: -1
Reputation: 499002
Use a dedicated CSV parser.
There are many out there. A popular one is FileHelpers, though there is one hidden in the Microsoft.VisualBasic.FileIO
namespace - TextFieldParser
.
Upvotes: 3
Reputation: 415790
Try the code from my answer here:
It handles quoted csv just fine.
Upvotes: 0