Reputation: 55
I'm working in closed envrionment where I cannot install additional packages and have limited ability to use .Net framework classes. Plus I have no control over the CSV file format that I'm receiving.
I receive a CSV file that must be pulled into our business system and updates the database.
I can pull the file in to a DataTable via the below code ...
CSV File Ex:
Order# Qty Description ...
12345 3 desc1, desc2, desc3, etc..
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(',');
DataRow dr = dt.NewRow();
for (int i = 0; i < rows.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
However, the problem is that one field in the CSV file is a description that contains multiple "," characters. Doing the above loads each comma separated word set in the description value into its own index in the rows array.
Currently there should be a total of 10 columns in the csv file but with the description field issue the number of columns vary depending on the length/number of commas in the description field...10, 15, 22 columns etc.
I have no control over the format of the CSV file before it's sent. Is there any way to get around this. Even skipping over this field when creating the DataTable would be fine for my purposes.
Thanks
Upvotes: 1
Views: 631
Reputation: 55
My Solution that ended up working
while (!sr.EndOfStream)
{
string[] rows = sr.ReadLine().Split(',');
var fullrow = String.Empty;
foreach (var entry in rows)
{
fullrow += entry.ToString() + ",";
}
var startQuote = fullrow.IndexOf("\"");
var endQuote = fullrow.IndexOf("\"", startQuote + 1); //LastIndexOf("\"");
if (startQuote > -1 && endQuote > -1)
{
var substring = fullrow.Substring(startQuote, Math.Abs(startQuote - endQuote));
substring = substring.Replace(',', ' ');
fullrow = fullrow.Remove(startQuote, Math.Abs(startQuote - endQuote)).Insert(startQuote, substring);
}
rows = fullrow.Split(',');
DataRow dr = dt.NewRow();
for (int i = 0; i < rows.Length; i++)
{
dr[i] = rows[i];
}
dt.Rows.Add(dr);
}
Thanks @Michael Gorsich for the alternate code!
Upvotes: 0
Reputation: 337
Fstagger, this should work for you assuming you have only one column with internal comma's and the CSV is formed properly (especially if the Description field begins with ," and ends with ",. You need to replace my example INDEX_OF_DESCRIPTION with the actual value.
int iDescStart = 0;
int iDescEnd = 0;
string zLine = "";
const int INDEX_OF_DESCRIPTION = 3;
const char SEPARATOR = '\u001F'; //ASCII Unit Separator, decimal 31
while(!sr.EndOfStream){
zLine = sr.ReadLine();
iDescStart = zLine.IndexOf(",\"");
iDescEnd = zLine.IndexOf("\",");
zLine = zLine.Substring(0, iDescStart)
+ ","
+ zLine.Substring(iDescStart + 2, iDescEnd - iDescStart - 2).Replace(',', SEPARATOR)
+ ","
+ zLine.Substring(iDescEnd + 2);
string[] zaFields = zLine.Split(',');
zaFields[INDEX_OF_DESCRIPTION] = zaFields[INDEX_OF_DESCRIPTION].Replace(SEPARATOR, ',');
datarow dr = dt.NewRow();
for (int i = 0; i < zaFields.Length; i++){
dr[i] = zaFields[i];
}
dt.Rows.Add(dr);
}
Let me know if this works for you : )
Upvotes: 0
Reputation: 782
You can use textqualifier to enclose every field so that the commas or semicolons are not considered as delimeters. The following method should fix the problem.
Install-Package CsvHelper
public static DataTable ReadCSVToDataTable(string path)
{
CsvHelper.Configuration.CsvConfiguration config = new CsvHelper.Configuration.CsvConfiguration();
config.Delimiter = delimeter;
config.Encoding = new UTF8Encoding(false);
if (string.IsNullOrEmpty(textQualifier))
{
config.QuoteAllFields = false;
}
else
{
char qualifier = textQualifier.ToCharArray()[0];
config.Quote = qualifier;
config.QuoteAllFields = true;
}
DataTable dt = new DataTable();
using (var sr = new StreamReader(path))
{
using (var reader = new CsvReader(sr, config))
{
int j = 0;
while (reader.Read())
{
if (j == 0)
{
if (config.HasHeaderRecord)
{
foreach (string header in reader.FieldHeaders)
dt.Columns.Add(header);
}
else
{
for (int i = 0; i < reader.CurrentRecord.Length; i++)
dt.Columns.Add();
}
j++;
}
AddRow(dt, reader);
}
}
}
return dt;
}
Upvotes: 1
Reputation: 423
It looks like your CSV has fixed size columns padded with spaces. So I guess you'd be better off reading a fixed amount of characters for each column and trim the trailing spaces, instead of splitting with comma.
Upvotes: 0