Reputation: 557
I am trying to read a text file line by line and create one line from multiple lines until the line read in has \r\n at the end. My data looks like this:
BusID|Comment1|Text\r\n
1010|"Cuautla, Inc. d/b/a 3 Margaritas VIII\n
State Lic. #40428210000 City Lic.#4042821P\n
9/26/14 9/14/14 - 9/13/15 $175.00\n
9/20/00 9/14/00 - 9/13/01 $575.00 New License"\r\n
1020|"7-Eleven Inc., dba 7-Eleven Store #20638\n
State Lic. #24111110126; City Lic. #2411111126P\n
SEND ISSUED LICENSES TO DALLAS, TX\r\n
I want the data to look like this:
BusID|Comment1|Text\r\n
1010|"Cuautla, Inc. d/b/a 3 Margaritas VIII State Lic. #40428210000 City Lic.#4042821P 9/26/14 9/14/14 - 9/13/15 $175.00 9/20/00 9/14/00 - 9/13/01 $575.00 New License"\r\n
1020|"7-Eleven Inc., dba 7-Eleven Store #20638 State Lic. #24111110126; City Lic. #2411111126P SEND ISSUED LICENSES TO DALLAS, TX\r\n
My code is like this:
FileStream fsFileStream = new FileStream(strInputFileName, FileMode.Open,
FileAccess.Read, FileShare.ReadWrite);
using (StreamReader srStreamRdr = new StreamReader(fsFileStream))
{
while ((strDataLine = srStreamRdr.ReadLine()) != null && !blnEndOfFile)
{
//code evaluation here
}
I have tried:
if (strDataLine.EndsWith(Environment.NewLine))
{
blnEndOfLine = true;
}
and
if (strDataLine.Contains(Environment.NewLine))
{
blnEndOfLine = true;
}
These do not see anything at the end of the string variable. Is there a way for me to tell the true end of line so I can combine these rows into one row? Should I be reading the file differently?
Upvotes: 1
Views: 1062
Reputation: 216353
You cannot use the ReadLine method of the StringReader because every kind of newline. both the \r\n
and \n
are removed from the input, a line is returned by the reader and you will never know if the characters removed are \r\n or just \n
If the file is not really big then you can try to load everything in memory and do the splitting yourself into separate lines
// Load everything in memory
string fileData = File.ReadAllText(@"D:\temp\myData.txt");
// Split on the \r\n (I don't use Environment.NewLine because it
// respects the OS conventions and this could be wrong in this context
string[] lines = fileData.Split(new string[] { "\r\n"}, StringSplitOptions.RemoveEmptyEntries);
// Now replace the remaining \n with a space
lines = lines.Select(x => x.Replace("\n", " ")).ToArray();
foreach(string s in lines)
Console.WriteLine(s);
EDIT
If your file is really big (like you say 3.5GB) then you cannot load everything in memory but you need to process it in blocks. Fortunately the StreamReader provides a method called ReadBlock that allows us to implement code like this
// Where we store the lines loaded from file
List<string> lines = new List<string>();
// Read a block of 10MB
char[] buffer = new char[1024 * 1024 * 10];
bool lastBlock = false;
string leftOver = string.Empty;
// Start the streamreader
using (StreamReader reader = new StreamReader(@"D:\temp\localtext.txt"))
{
// We exit when the last block is reached
while (!lastBlock)
{
// Read 10MB
int loaded = reader.ReadBlock(buffer, 0, buffer.Length);
// Exit if we have no more blocks to read (EOF)
if(loaded == 0) break;
// if we get less bytes than the block size then
// we are on the last block
lastBlock = (loaded != buffer.Length);
// Create the string from the buffer
string temp = new string(buffer, 0, loaded);
// prepare the working string adding the remainder from the
// previous loop
string current = leftOver + temp;
// Search the last \r\n
int lastNewLinePos = temp.LastIndexOf("\r\n");
if (lastNewLinePos > -1)
{
// Prepare the working string
current = leftOver + temp.Substring(0, lastNewLinePos + 2);
// Save the incomplete parts for the next loop
leftOver = temp.Substring(lastNewLinePos + 2);
}
// Process the lines
AddLines(current, lines);
}
}
void AddLines(string current, List<string> lines)
{
var splitted = current.Split(new string[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
lines.AddRange(splitted.Select(x => x.Replace("\n", " ")).ToList());
}
This code assumes that your file always ends with a \r\n and that you always get a \r\n inside a block of 10MB of text. More tests are needed with your actual data.
Upvotes: 1
Reputation: 16310
You can just read all text by calling File.ReadAllText(path)
and parse it in following way :
string input = File.ReadAllText(your_file_path);
string output = string.Empty;
input.Split(new[] { Environment.NewLine } , StringSplitOptions.RemoveEmptyEntries).
Skip(1).ToList().
ForEach(x =>
{
output += x.EndsWith("\\r\\n") ? x + Environment.NewLine
: x.Replace("\\n"," ");
});
Upvotes: 0
Reputation: 1157
If what you have posted is exactly whats in the file. Meaning the \r\n are indeed written, you can use the following to unescape them:
strDataLine.Replace("\\r", "\r").Replace("\\n", "\n");
this will ensure you can now use Environment.NewLine
in order to do your comparison as in:
if (strDataLine.Replace("\\r", "\r").Replace("\\n", "\n").EndsWith(Environment.NewLine))
{
blnEndOfLine = true;
}
Upvotes: 0