Batista
Batista

Reputation: 151

Read textfile from specific position till specific length

Due to me receiving a very bad datafile, I have to come up with code to read from a non delimited textfile from a specific starting position and a specific length to buildup a workable dataset. The textfile is not delimited in any way, but I do have the starting and ending position of each string that I need to read. I've come up with this code, but I'm getting an error and can't figure out why, because if I replace the 395 with a 0 it works..

e.g. Invoice number starting position = 395, ending position = 414, length = 20

using (StreamReader sr = new StreamReader(@"\\t.txt"))
{                    
    char[] c = null;                   
    while (sr.Peek() >= 0)
    {
        c = new char[20];//Invoice number string
        sr.Read(c, 395, c.Length); //THIS IS GIVING ME AN ERROR                      
        Debug.WriteLine(""+c[0] + c[1] + c[2] + c[3] + c[4]..c[20]);
    }
}

Here is the error that I get:

System.ArgumentException: Offset and length were out of bounds for the array 
                          or count is greater than the number of elements from
                          index to the end of the source collection. at
                          System.IO.StreamReader.Read(Char[] b

Upvotes: 2

Views: 19621

Answers (6)

Elastep
Elastep

Reputation: 3408

395 is the index in c array at which you start writing. There's no 395 index there, max is 19. I would suggest something like this.

StreamReader r;
...
string allFile = r.ReadToEnd();
int offset = 395;
int length = 20;

And then use

allFile.Substring(offset, length)

Upvotes: -2

Batista
Batista

Reputation: 151

Solved this ages ago, just wanted to post the solution that was suggested

 using (StreamReader sr = new StreamReader(path2))
        {
          string line;
            while ((line = sr.ReadLine()) != null)
            {             
                dsnonhb.Tables[0].Columns.Add("InvoiceNum"  );
                dsnonhb.Tables[0].Columns.Add("Odo"         );
                dsnonhb.Tables[0].Columns.Add("PumpVal"      );
                dsnonhb.Tables[0].Columns.Add("Quantity"    );


                DataRow myrow;
                myrow = dsnonhb.Tables[0].NewRow();
                myrow["No"] = rowcounter.ToString();
                myrow["InvoiceNum"] = line.Substring(741, 6);
                myrow["Odo"] = line.Substring(499, 6);
                myrow["PumpVal"] = line.Substring(609, 7);
                myrow["Quantity"] = line.Substring(660, 6);

Upvotes: 0

Peet Brits
Peet Brits

Reputation: 3255

Please Note

Seek() is too low level for what the OP wants. See this answer instead for line-by-line parsing.

Also, as Jordan mentioned, Seek() has the issue of character encodings and varying character sizes (e.g. for non-ASCII and non-ANSI files, like UTF, which is probably not applicable to this question). Thanks for pointing that out.


Original Answer

Seek() is only available on a stream, so try using sr.BaseStream.Seek(..), or use a different stream like such:

using (Stream s = new FileStream(path, FileMode.Open))
{
    s.Seek(offset, SeekOrigin.Begin);
    s.Read(buffer, 0, length);
}

Upvotes: 4

Jordan
Jordan

Reputation: 9901

I've created a class called AdvancedStreamReader into my Helpers project on git hub here:

https://github.com/jsmunroe/Helpers/blob/master/Helpers/IO/AdvancedStreamReader.cs

It is fairly robust. It is a subclass of StreamReader and keeps all of that functionality intact. There are a few caveats: a) it resets the position of the stream when it is constructed; b) you should not seek the BaseStream while you are using the reader; c) you need to specify the newline character type if it differs from the environment and the file can only use one type. Here are some unit tests to demonstrate how it is used.

    [TestMethod]
    public void ReadLineWithNewLineOnly()
    {
        // Setup
        var text = $"ƒun ‼Æ¢ with åò☺ encoding!\nƒun ‼Æ¢ with åò☺ encoding!\nƒun ‼Æ¢ with åò☺ encoding!\nHa!";
        var bytes = Encoding.UTF8.GetBytes(text);
        var stream = new MemoryStream(bytes);
        var reader = new AdvancedStreamReader(stream, NewLineType.Nl);
        reader.ReadLine();

        // Execute
        var result = reader.ReadLine();

        // Assert
        Assert.AreEqual("ƒun ‼Æ¢ with åò☺ encoding!", result);
        Assert.AreEqual(54, reader.CharacterPosition);
    }


    [TestMethod]
    public void SeekCharacterWithUtf8()
    {
        // Setup
        var text = $"ƒun ‼Æ¢ with åò☺ encoding!{NL}ƒun ‼Æ¢ with åò☺ encoding!{NL}ƒun ‼Æ¢ with åò☺ encoding!{NL}Ha!";
        var bytes = Encoding.UTF8.GetBytes(text);
        var stream = new MemoryStream(bytes);
        var reader = new AdvancedStreamReader(stream);

        // Pre-condition assert
        Assert.IsTrue(bytes.Length > text.Length); // More bytes than characters in sample text.

        // Execute
        reader.SeekCharacter(84);

        // Assert
        Assert.AreEqual(84, reader.CharacterPosition);
        Assert.AreEqual($"Ha!", reader.ReadToEnd());
    }

I wrote this for my own use, but I hope it will help other people.

Upvotes: -1

Peet Brits
Peet Brits

Reputation: 3255

(new answer based on comments)

You are parsing invoice data, with each entry on a new line, and the required data is at a fixed offset for every line. Stream.Seek() is too low level for what you want to do, because you will need several seeks, one for every line. Rather use the following:

int offset = 395;
int length = 20;
using (StreamReader sr = new StreamReader(@"\\t.txt"))
{
    while (!sr.EndOfStream)
    {
        string line = sr.ReadLine();
        string myData = line.Substring(offset, length);
    }
}

Upvotes: 0

Fischermaen
Fischermaen

Reputation: 12458

Here is my suggestion for you:

using (StreamReader sr = new StreamReader(@"\\t.txt"))
{
    char[] c = new char[20];  // Invoice number string 
    sr.BaseStream.Position = 395;
    sr.Read(c, 0, c.Length); 
}

Upvotes: 0

Related Questions