Reputation: 2669
As part of a recent project I had to read and write from a CSV file and put in a grid view in c#. In the end decided to use a ready built parser to do the work for me.
Because I like to do that kind of stuff, I wondered how to go about writing my own.
So far all I've managed to do is this:
//Read the header
StreamReader reader = new StreamReader(dialog.FileName);
string row = reader.ReadLine();
string[] cells = row.Split(',');
//Create the columns of the dataGridView
for (int i = 0; i < cells.Count() - 1; i++)
{
DataGridViewTextBoxColumn column = new DataGridViewTextBoxColumn();
column.Name = cells[i];
column.HeaderText = cells[i];
dataGridView1.Columns.Add(column);
}
//Display the contents of the file
while (reader.Peek() != -1)
{
row = reader.ReadLine();
cells = row.Split(',');
dataGridView1.Rows.Add(cells);
}
My question: is carrying on like this a wise idea, and if it is (or isn't) how would I test it properly?
Upvotes: 6
Views: 3247
Reputation: 5386
This come from http://www.gigawebsolution.com/Posts/Details/61/Building-a-Simple-CSV-Parser-in-C#
public interface ICsvReaderWriter
{
List<string[]> Read(string filePath, char delimiter);
void Write(string filePath, List<string[]> lines, char delimiter);
}
public class CsvReaderWriter : ICsvReaderWriter
{
public List<string[]> Read(string filePath, char delimiter)
{
var fileContent = new List<string[]>();
using (var reader = new StreamReader(filePath, Encoding.Unicode))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (!string.IsNullOrEmpty(line))
{
fileContent.Add(line.Split(delimiter));
}
}
}
return fileContent;
}
public void Write(string filePath, List<string[]> lines, char delimiter)
{
using (var writer = new StreamWriter(filePath, true, Encoding.Unicode))
{
foreach (var line in lines)
{
var data = line.Aggregate(string.Empty,
(current, column) => current +
string.Format("{0}{1}", column,delimiter))
.TrimEnd(delimiter);
writer.WriteLine(data);
}
}
}
}
Upvotes: 0
Reputation: 67345
Parsing a CSV file isn't difficult, but it involves more than simply calling String.Split()
.
You are breaking the lines at each comma. But it's possible for fields to contain embedded commas. In these cases, CSV wraps the field in double quotes. So you must also look for double quotes and ignore commas within those quotes. In addition, it's even possible for fields to contain embedded double quotes. Double quotes must appear within double quotes and be "doubled up" to indicate the quote is a literal character.
If you'd like to see how I did it, you can check out this article.
Upvotes: -1
Reputation: 25734
... is carrying on like this a wise idea ...?
Since you're doing this as a learning exercise, you may want to dig deeper into lexing and parsing theory. Your current approach will show its shortcomings fairly quickly as described in Stop Rolling Your Own CSV Parser!. It's not that parsing CSV data is difficult. (It's not.) It's just that most CSV parser projects treat the problem as a text splitting problem versus a parsing problem. If you take the time to define the CSV "language", the parser almost writes itself.
RFC 4180 defines a grammar for CSV data in ABNF form:
file = [header CRLF] record *(CRLF record) [CRLF]
header = name *(COMMA name)
record = field *(COMMA field)
name = field
field = (escaped / non-escaped)
escaped = DQUOTE *(TEXTDATA / COMMA / CR / LF / 2DQUOTE) DQUOTE
non-escaped = *TEXTDATA
COMMA = %x2C
CR = %x0D ;as per section 6.1 of RFC 2234
DQUOTE = %x22 ;as per section 6.1 of RFC 2234
LF = %x0A ;as per section 6.1 of RFC 2234
CRLF = CR LF ;as per section 6.1 of RFC 2234
TEXTDATA = %x20-21 / %x23-2B / %x2D-7E
This grammar shows how single characters are built up to create more and more complex language elements. (As written, definitions go the opposite direction from complex to simple.)
If you start with a grammar, you can write parsing functions that mirror non-terminal grammar elements (the lowercase items). Julian M Bucknall describes the process in Writing a parser for CSV data. Take a look at Test-Driven Development with ANTLR for an example of the same process using a parser generator.
Keep in mind, there is no one accepted CSV definition. CSV data in the wild is not guaranteed to implement all of the RFC 4180 suggestions.
Upvotes: 4
Reputation: 41262
As a programming exercise (for learning and gaining experience) it is probably a very reasonable thing to do. For production code, it may be better to use an existing library mainly because the work is already done. There are quite a few things to address with a CSV parser. For example (randomly off the top of my head):
If you have a very specific input format in a very controlled environment, though, you may not need to deal with all of those.
Upvotes: 8
Reputation: 51369
Get (or make) some CSV data and write Unit Tests using NUnit or Visual Studio Testing Tools.
Be sure to test edge cases like
"csv","Data","with","a","trailing","comma",
and
"csv","Data","with,","commas","and","""quotes""","in","it"
Upvotes: 2