Reputation: 39
I have this type of data in a text file (csv) :
column1|column2|column3|column4|column5 (\r\n)
column1|column2|column3|column4|column5 (\r\n)
column1|column2 (\r\n)
column2 (\r\n)
column2|column3|column4|column5 (\r\n)
I would like to delete the \r\n that are line 3 and line 4 to have :
column1|column2|column3|column4|column5 (\r\n)
column1|column2|column3|column4|column5 (\r\n)
column1|column2/column2/column2|column3|column4|column5 (\r\n)
My idea is if the row doesn't have 4 column separators ("|") then delete the CRLF, and repeat the operation until you have only correct rows.
This is my code :
String path = "test.csv";
// Read file
string[] readText = File.ReadAllLines(path);
// Empty the file
File.WriteAllText(path, String.Empty);
int x = 0;
int countheaders = 0;
int countlines;
using (StreamWriter writer = new StreamWriter(path))
{
foreach (string s in readText)
{
if (x == 0)
{
countheaders = s.Where(c => c == '|').Count();
x = 1;
}
countlines = 0;
countlines = s.Where(d => d == '|').Count();
if (countlines == countheaders)
{
writer.WriteLine(s);
}
else
{
string s2 = s;
s2 = s2.ToString().TrimEnd('\r', '\n');
writer.Write(s2);
}
}
}
The problem is that i'm reading the file in one pass, so the line break on line 4 is removed and line 4 and line 5 are together...
Upvotes: 0
Views: 568
Reputation: 32770
You could probably do the following (cant test it now, but it should work):
IEnumerable<string> batchValuesIn(
IEnumerable<string> source,
string separator,
int size)
{
var counter = 0;
var buffer = new StringBuilder();
foreach (var line in source)
{
var values = line.Split(separator);
if (line.Length != 0)
{
foreach (var value in values)
{
buffer.Append(value);
counter++;
if (counter % size == 0)
{
yield return buffer.ToString();
buffer.Clear();
}
else
buffer.Append(separator);
}
}
}
if (buffer.Length != 0)
yield return buffer.ToString();
And you'd use it like:
var newLines = batchValuesIn(File.ReadLines(path), "|", 5);
The good thing about this solution is that you are never loading into memory the enitre orignal source. You simply build the lines on the fly.
DISCLAIMER: this may behave weirdly with malfomred input strings.
Upvotes: 1