RNat
RNat

Reputation: 35

get the position of a string in a text file based on the line number in c#

I have an input text file that comes from a third party and i wrote a c# program to process it and get the results. I have the results and I need to update the same file with the results. The third party updates their DB based on this output file. I need to get the position of the string to update the file.

Ex: The input file looks this way:

Company Name: <some name>            ID: <some ID>
----------------------------------------------------
Transaction_ID:0000001233        Name:John    Amount:40:00  Output_Code:
-----------------------------------------------------------------------
Transaction_ID:0000001234        Name:Doe     Amount:40:00  Output_Code:
------------------------------------------------------------------------

Please note: transaction_ID is unique in each row.

The Output file should be:

Company Name: <some name>            ID: <some ID>
----------------------------------------------------
Transaction_ID:0000001233        Name:John    Amount:40:00  Output_Code:01
-----------------------------------------------------------------------
Transaction_ID:0000001234        Name:Doe     Amount:40:00  Output_Code:02
---------------------------------------------------------------------------

The codes 01 and 02 are the results of the c# program and have to be updated in the response file.

I have the code find out the position of "Transaction_ID:0000001233" and "Output_Code:". I am able to update the first row. But I am not able to get the position of the "Output_Code:" for the second row. How do I identify the string based on the line number? I cannot rewrite the whole response file as it has other unwanted columns. The best option here would be to update the existing file.

long positionreturnCode1 =    FileOps.Seek(filePath, "Output_Code:");
//gets the position of Output_Code in the first row.
byte[] bytesToInsert = System.Text.Encoding.ASCII.GetBytes("01");
FileOps.InsertBytes(bytesToInsert, newPath, positionreturnCode1);

// the above code inserts "01" in the correct position. ie:first row

long positiontransId2 = FileOps.Seek(filePath, "Transaction_ID:0000001234");
long positionreturnCode2 = FileOps.Seek(filePath, "Output_Code:");

// still gets the first row's value

long pos = positionreturnCode2 - positiontransId2;

byte[] bytesToInsert = System.Text.Encoding.ASCII.GetBytes("02");
FileOps.InsertBytes(bytesToInsert, newPath, pos);

// this inserts in a completely different position. 

I know the logic is wrong. But I am trying to get the position of output code value in the second row.

Upvotes: 0

Views: 1399

Answers (3)

Carter
Carter

Reputation: 744

The additions here are send in a position based on where your main program has already updated and keep that moving forward ahead the length of what you also added.

I believe if I am reading the code there and in your example correctly this should make you scoot along through the file.

This function is within the utils that you linked in your comment.

public static long Seek(string file, long position, string searchString)
        {
            //open filestream to perform a seek
            using (System.IO.FileStream fs =
                        System.IO.File.OpenRead(file))
            {
                fs.Position = position;
                return Seek(fs, searchString);
            }
        }

Upvotes: 0

Scott Hannen
Scott Hannen

Reputation: 29207

To start with, I'll isolate the part that takes a transaction and returns a code, since I don't know what that is, and it's not relevant. (I'd do the same thing even if I did know.)

public class Transaction
{
    public Transaction(string transactionId, string name, decimal amount)
    {
        TransactionId = transactionId;
        Name = name;
        Amount = amount;
    }

    public string TransactionId { get; }
    public string Name { get; }
    public decimal Amount { get; }
}

public interface ITransactionProcessor
{
    // returns an output code
    string ProcessTransaction(Transaction transaction);
}

Now we can write something that processes a set of strings, which could be lines from a file. That's something to think about. You get the strings from a file, but would this work any different if they didn't come from a file? Probably not. Besides, manipulating the contents of a file is harder. Manipulating strings is easier. So instead of "solving" the harder problem we're just converting it into an easier problem.

For each string it's going to do the following:

  • Read a transaction, including whatever fields it needs, from the string.
  • Process the transaction and get an output code.
  • Add the output code to the end of the string.

Again, I'm leaving out the part that I don't know. For now it's in a private method, but it could be described as a separate interface.

public class StringCollectionTransactionProcessor // Horrible name, sorry.
{
    private readonly ITransactionProcessor _transactionProcessor;

    public StringCollectionTransactionProcessor(ITransactionProcessor transactionProcessor)
    {
        _transactionProcessor = transactionProcessor;
    }

    public IEnumerable<string> ProcessTransactions(IEnumerable<string> inputs)
    {
        foreach (var input in inputs)
        {
            var transaction = ParseTransaction(input);
            var outputCode = _transactionProcessor.ProcessTransaction(transaction);
            var outputLine = $"{input} {outputCode}";
            yield return outputLine;
        }
    }

    private Transaction ParseTransaction(string input)
    {
        // Get the transaction ID and whatever values you need from the string.
    }
}

The result is an IEnumerable<string> where each string is the original input, unmodified except for the output code appended that the end. If there were any extra columns in there that weren't related to your processing, that's okay. They're still there.

There are likely other factors to consider, like exception handling, but this is a starting point. It gets simpler if we completely isolate different steps from each other so that we only have to think about one thing at a time.

As you can see, I've still left things out. For example, where do the strings come from? Do they come from a file? Where do the results go? Another file? Now it's much easier to see how to add those details. They seemed like they were the most important, but now we've rearranged this so that they're the least important.

It's easy to write code that reads a file into a collection of strings.

var inputs = file.ReadLines(path);

When you're done and you have a collection of strings, it's easy to write them to a file.

File.WriteAllLines(path, linesToWrite);

We wouldn't add those details into the above classes. If we do, we've restricted those classes to only working with files, which is unnecessary. Instead we just write a new class which reads the lines, gets a collection of strings, passes it to the other class to get processed, gets back a result, and writes it to a file.


This is an iterative process that allows us to write the parts we understand and leave the parts we haven't figured out for later. That keeps us moving forward solving one problem at a time instead of getting stuck trying to solve a few at once.

A side effect is that the code is easier to understand. It lends itself to writing methods with just a few lines. Each is easy to read. It's also much easier to write unit tests.


In response to some comments:

If the output code doesn't go at the end of the line - it's somewhere in the middle, you can still update it:

var line = line.Replace("Output_Code:", "Output_Code:" + outputCode);

That's messy. If the line is delimited, you could split it, find the element that contains Output_Code, and completely replace it. That way you don't get weird results if for some reason there's already an output code.

If the step of processing a transaction includes updating a database record, that's fine. That can all be within ITransactionProcessor.ProcessTransaction.

If you want an even safer system you could break the whole thing down into two steps. First process all of the transactions, including your database updates, but don't update the file at all.

After you're done processing all of the transactions, go back through the file and update it. You could do that by looking up the output code for each transaction in the database. Or, processing transactions could return a Dictionary<string, string> containing the transaction ids and output codes. When you're done with all the processing, go through the file a second time. For each transaction ID, see if there's an output code. If there is, update that line.

Upvotes: 0

avariant
avariant

Reputation: 2300

Don't try to "edit" the existing file. There is too much room for error.

Rather, assuming that the file format will not change, parse the file into data, then rewrite the file completely. An example, in pseudo-code below:

public struct Entry
{
    public string TransactionID;
    public string Name;
    public string Amount;
    public string Output_Code;
}

Iterate through the file and create a list of Entry instances, one for each file line, and populate the data of each Entry instance with the contents of the line. It looks like you can split the text line using white spaces as a delimiter and then further split each entry using ':' as a delimiter.

Then, for each entry, you set the Output_Code during your processing phase.

foreach(Entry entry in entrylist)
   entry.Output_Code = MyProcessingOfTheEntryFunction(entry);

Finally iterate through your list of entries and rewrite the entire file using the data in your Entry list. (Making sure to correctly write the header and any line spacers, etc..)

OpenFile();
WriteFileHeader();
foreach(Entry entry in entrylist)
{
   WriteLineSpacer();
   WriteEntryData(entry);
}
CloseFile();

Upvotes: 1

Related Questions