Reputation: 3065
I have a delimited file with a few thousand lines in it, and I wrote a method to automatically detect the delimiter.
The method looks like this:
private bool TryDetermineDelimiter(FileInfo target, out char delimiter)
{
char[] possibleDelimiters = new char[] { ',', ';', '-', ':' };
using (StreamReader sr = new StreamReader(target.OpenRead()))
{
List<int> delimiterHits = new List<int>();
foreach (char del in possibleDelimiters)
{
while (!sr.EndOfStream)
{
var line = sr.ReadLine();
var matches = Regex.Matches(line, $"{del}(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)");
if(matches.Count == 0)
{
sr.BaseStream.Seek(0, SeekOrigin.Begin);
break;
}
delimiterHits.Add(matches.Count);
}
if (delimiterHits.Any(d => d != delimiterHits[0]) || delimiterHits.Count == 0)
{
delimiterHits.Clear();
continue;
}
delimiter = del;
return true;
}
}
delimiter = ',';
return false;
}
There is a strange thing happening, where at the 5th line, the call to sr.ReadLine()
is returning the 5th line with the 1st line concatenated
So for example:
delimited file:
col1; col2; col3; col4
val1; val2; val3; val4
val5; val6; val7; val8
...
The first 4 calls to StreamReader.ReadLine()
return the expected lines but the 5th call returns: val13; val14; val15; val16; col1; col2; col3; col4;
Stepping through, I can confirm that the loop never enters the if(matches.Count == 0)
block, the correct number of delimiters is found each iteration.
Unfortunately I can't post the contents of the actual file because it may get me in trouble, but I have ensured there is no fishy business with the line endings or other characters. The file is as expected.
I should also mention that this bug does not occur with comma separated values, only with semicolons.
Upvotes: 1
Views: 142
Reputation: 15314
Change your code to this
if (matches.Count == 0)
{
sr.BaseStream.Seek(0, SeekOrigin.Begin);
sr.DiscardBufferedData();
break;
}
By instructing the StreamReader
to discard its buffer, you're instructing it to synchronize with the actual base stream.
Other than that, the lines returned aren't concatenated, but it is looping back on its self, though what I've shown above will fix that
Upvotes: 2