Reputation: 2025
I am developing a system that processes sequential files generated by Cobol systems, currently, I am doing the data processing using several substrings to get the data, but I wonder if there is a more efficient way to process the file than to make several substrings...
At now, I do basically:
using (var sr = new StreamReader("file.txt"))
{
String line = "";
while(!sr.EndOfStream)
{
line = sr.ReadLine();
switch(line[0])
{
case '0':
processType0(line);
break;
case '1':
processType1(line);
break;
case '2':
processType2(line);
break;
case '9':
processType9(line);
break;
}
}
}
private void processType0(string line)
{
type = line.Substring(0, 15);
name = line.Substring(15, 30);
//... and more 20 substrings
}
private void processType1(string line)
{
// 45 substrings...
}
The file size may vary between 50mb and 150mb... A small example of the file:
01ARQUIVO01CIVDSUQK 00000000000000999999NAME NAME NAME NAME 892DATAFILE 200616 KY0000853 000001
1000000000000000000000000999904202589ESMSS59365 00000010000000000000026171900000000002 0 01000000000001071600000099740150000000001N020516000000000000000000000000000000000000000000000000000000000000009800000000000000909999-AAAAAAAAAAAAAAAAAAAAAAAAA 00000000 000002
1000000000000000000000000861504202589ENJNS63198 00000010000000000000036171300000000002 0 01000000000001071600000081362920000000001N020516000000000000000000000000000000000000000000000000000000000000009800000000000000909999-BBBBBBBBBBBBBBBBBBBBBBBBBB 00000000 000003
9 000004
Upvotes: 4
Views: 817
Reputation: 4116
First time with C# but I think you want to look at something like
struct typeOne {
fixed byte recordType[1];
fixed byte whatThisFieldIsCalled[10];
fixed byte someOtherFieldName[5];
...
}
And then just assign different structs by line[0] case. Or, knowing next to nada about C# that could be in the completely wrong ballpark and end up being a poor performer internally.
Upvotes: 1
Reputation:
Frequent disk reads will slow down your code.
According to MSDN, the buffer size for the constructor you are using is 1024 bytes. Set a larger buffer size using a different constructor:
int bufferSize = 1024 * 128;
using (var reader = new StreamReader(path, encoding, autoDetectEncoding, bufferSize))
{
...
}
C# prioritizes safety over speed, so all String functions generate a new string.
Do you really need all of those substrings? If not, then just generate the ones you need:
private static string GetType(string line)
{
return line.Substring(0, 15);
}
if (needed)
type = GetLine(line);
Upvotes: 2
Reputation: 25320
You could try writing a parser which processes the file one character at a time.
I read a good article titled 'Writing a parser for CSV data' on how to do this with CSV files the other day, though the principals are the same for most file types. This can be found here http://www.boyet.com/articles/csvparser.html
Upvotes: 1
Reputation: 3281
I love Linq
IEnumerable<string> ReadFile(string path)
{
using (var reader = new StreamReader(path))
{
while (!reader.EndOfStream)
{
yield return reader.ReadLine();
}
}
}
void DoThing()
{
var myMethods = new Action<string>[]
{
s =>
{
//Process 0
type = line.Substring(0, 15);
name = line.Substring(15, 30);
//... and more 20 substrings
},
s =>
{
//Process 1
type = line.Substring(0, 15);
name = line.Substring(15, 30);
//... and more 20 substrings
},
//...
}
var actions = ReadFile(@"c:\path\to\file.txt")
.Select(line => new Action( () => myMethods[int.Parse(line[0])]() ))
.ToArray();
actions.ForEach(a => a.Invoke());
}
Upvotes: 0