Reputation: 2025

Efficient processing of sequential files C#

I am developing a system that processes sequential files generated by Cobol systems, currently, I am doing the data processing using several substrings to get the data, but I wonder if there is a more efficient way to process the file than to make several substrings...

At now, I do basically:

using (var sr = new StreamReader("file.txt"))
{
    String line = "";
    while(!sr.EndOfStream)
    {
        line = sr.ReadLine();
        switch(line[0])
        {
            case '0':
                processType0(line);
                break;
            case '1':
                processType1(line);
                break;
            case '2':
                processType2(line);
                break;
            case '9':
                processType9(line);
                break;
        }
    }
}

private void processType0(string line)
{
    type = line.Substring(0, 15);
    name = line.Substring(15, 30);
    //... and more 20 substrings
}

private void processType1(string line)
{
    // 45 substrings...
}

The file size may vary between 50mb and 150mb... A small example of the file:

01ARQUIVO01CIVDSUQK       00000000000000999999NAME NAME NAME NAME           892DATAFILE       200616        KY0000853                                                                                                                                                                                                                                                                                     000001
1000000000000000000000000999904202589ESMSS59365        00000010000000000000026171900000000002            0  01000000000001071600000099740150000000001N020516000000000000000000000000000000000000000000000000000000000000009800000000000000909999-AAAAAAAAAAAAAAAAAAAAAAAAA                                                            00000000                                                            000002
1000000000000000000000000861504202589ENJNS63198        00000010000000000000036171300000000002            0  01000000000001071600000081362920000000001N020516000000000000000000000000000000000000000000000000000000000000009800000000000000909999-BBBBBBBBBBBBBBBBBBBBBBBBBB                                                           00000000                                                            000003
9                                                                                                                                                                                                                                                                                                                                                                                                         000004

Upvotes: 4

Answers (4)

Brian Tiffin

Reputation: 4116

First time with C# but I think you want to look at something like

struct typeOne {
    fixed byte recordType[1];
    fixed byte whatThisFieldIsCalled[10];
    fixed byte someOtherFieldName[5];
    ...
}

And then just assign different structs by line[0] case. Or, knowing next to nada about C# that could be in the completely wrong ballpark and end up being a poor performer internally.

Upvotes: 1

user1023602

Reputation:

Frequent disk reads will slow down your code.

According to MSDN, the buffer size for the constructor you are using is 1024 bytes. Set a larger buffer size using a different constructor:

int bufferSize = 1024 * 128;

using (var reader = new StreamReader(path, encoding, autoDetectEncoding, bufferSize))
{
   ...
}

C# prioritizes safety over speed, so all String functions generate a new string.

Do you really need all of those substrings? If not, then just generate the ones you need:

private static string GetType(string line)
{
    return line.Substring(0, 15);
}

if (needed)
    type = GetLine(line);

Upvotes: 2

Martin Brown

Reputation: 25320

You could try writing a parser which processes the file one character at a time.

I read a good article titled 'Writing a parser for CSV data' on how to do this with CSV files the other day, though the principals are the same for most file types. This can be found here http://www.boyet.com/articles/csvparser.html

Upvotes: 1

fahadash

Reputation: 3281

I love Linq

IEnumerable<string> ReadFile(string path)
{
 using (var reader = new StreamReader(path))
  {
    while (!reader.EndOfStream)
    {
     yield return reader.ReadLine();
    }
  }
}


void DoThing() 
{
  var myMethods = new Action<string>[] 
    { 
      s => 
         {
           //Process 0            
           type = line.Substring(0, 15);
           name = line.Substring(15, 30);
           //... and more 20 substrings
         },
      s => 
         {
           //Process 1            
           type = line.Substring(0, 15);
           name = line.Substring(15, 30);
           //... and more 20 substrings
         },
            //...

    }

var actions = ReadFile(@"c:\path\to\file.txt")
      .Select(line => new Action( () => myMethods[int.Parse(line[0])]() ))
      .ToArray();

 actions.ForEach(a => a.Invoke());
}

Upvotes: 0

Efficient processing of sequential files C#

Answers (4)

Related Questions