Teknos
Teknos

Reputation: 411

Parsing a CSV File with C#, ignoring thousand separators

Working on a program that takes a CSV file and splits on each ",". The issue I have is there are thousand separators in some of the numbers. In the CSV file, the numbers render correctly. When viewed as a text document, they are shown like below:

Dog,Cat,100,100,Fish

In a CSV file, there are four cells, with the values "Dog", "Cat", "100,000", "Fish". When I split on the "," to an array of strings, it contains 5 elements, when what I want is 4. Anyone know a way to work around this?

Thanks

Upvotes: 1

Views: 3070

Answers (8)

Matt
Matt

Reputation: 3736

Do you know that it will always contain exactly four columns? If so, this quick-and-dirty LINQ code would work:

string[] elements = line.Split(',');

string element1 = elements.ElementAt(0);
string element2 = elements.ElementAt(1);

// Exclude the first two elements and the last element.
var element3parts = elements.Skip(2).Take(elements.Count() - 3);
int element3 = Convert.ToInt32(string.Join("",element3parts));

string element4 = elements.Last();

Not elegant, but it works.

Upvotes: 0

BugFinder
BugFinder

Reputation: 17858

I ran into a similar issue with fields with line feeds in. Im not convinced this is elegant, but... For mine I basically chopped mine into lines, then if the line didnt start with a text delimeter, I appended it to the line above.

You could try something like this : Step through each field, if the field has an end text delimeter, move to the next, if not, grab the next field, appaend it, rince and repeat till you do have an end delimeter (allows for 1,000,000,000 etc) ..

(Im caffeine deprived, and hungry, I did write some code but it was so ugly, I didnt even post it)

Upvotes: 0

Muad'Dib
Muad'Dib

Reputation: 29226

you might want to have a look at the free opensource project FileHelpers. If you MUST use your own code, here is a primer on the CSV "standard" format

Upvotes: 1

Chris Hodgkinson
Chris Hodgkinson

Reputation: 1

You may be able to use Regex.Replace to get rid of specifically the third comma as per below before parsing?

Replaces up to a specified number of occurrences of a pattern specified in the Regex constructor with a replacement string, starting at a specified character position in the input string. A MatchEvaluator delegate is called at each match to evaluate the replacement.

[C#] public string Replace(string, MatchEvaluator, int, int);

Upvotes: 0

Joel Coehoorn
Joel Coehoorn

Reputation: 415881

There are two common mistakes made when reading csv code: using a split() function and using regular expressions. Both approaches are wrong, in that they are prone to corner cases such as yours and slower than they could be.

Instead, use a dedicated parser such as Microsoft.VisualBasic.TextFieldParser, CodeProject's FastCSV or Linq2csv, or my own implemention here on Stack Overflow.

Upvotes: 6

Reed Copsey
Reed Copsey

Reputation: 564441

Typically, CSV files would wrap these elements in quotes, causing your line to be displayed as:

Dog,Cat,"100,100",Fish

This would parse correctly (if using a reasonable method, ie: the TextFieldParser class or a 3rd party library), and avoid this issue.

I would consider your file as an error case - and would try to correct the issue on the generation side.

That being said, if that is not possible, you will need to have more information about the data structure in the file to correct this. For example, in this case, you know you should have 4 elements - if you find five, you may need to merge back together the 3rd and 4th, since those two represent the only number within the line.

This is not possible in a general case, however - for example, take the following:

100,100,100

If that is 2 numbers, should it be 100100, 100, or should it be 100, 100100? There is no way to determine this without more information.

Upvotes: 3

Michael Blake
Michael Blake

Reputation: 2168

Don't just split on the , split on ", ".
Better still, use a CSV library from google or codeplex etc
Reading a CSV file in .NET?

Upvotes: 0

Sergei Golos
Sergei Golos

Reputation: 4350

well you could always split on ("\",\"") and then trim the first and last element.

But I would look into regular expressions that match elements with in "".

Upvotes: 0

Related Questions