Philo
Philo

Reputation: 1989

Reading text Files - single line vs. multiple lines

I am working on a particular scenario, where I have to read from a Text File, parse it, extract meaningful information from it, perform SQL queries with the information and then produce a reponse, output file.

I have about 3000 lines of code. Everything is working as expected. However I have been thinking of a connendrum that could possibly dissrupt my project.

The text file being read (lets call it Text.txt) may consist of a single line or multiple lines.

In my case, a 'line' is identified by its segment name - say ISA, BHT, HB, NM1, etc... each segment ending is identified by a special character '~'.

Now if the file consists of multiple lines (such that each line corresponds to a single segment); say:-

ISA....... ~

NM1....... ~

DMG....... ~

SE........ ~

and so on.... then my code essentially reads each 'line' (i.e. each segment), one at a time and stores it into a temp buffer using the following command :-

         ReadLn(myFile,buffer);

and then performs evaluations based on each line. Produces the desired output. No problems.


However the issue is... what if the file consists of only a single line (consisting of multiple segments), represented as:-

ISA....... ~NM1....... ~DMG....... ~SE........ ~

then with my ReadLine command I read the entire line instead of each segment, one at a time. This doesn't work for my code.

I was thinking about creating an if, else statement pair...which is based on how many lines my Txt.txt file consists of..such as:-

if line = 1:- then extract each segment at a time...seperated by the special character '~' perform necessary tasks (3000 lines of code) else if line > 1:- then extract each line at a time (corresponding to each segment) perform necessary tasks (3000 lines of code).

now the 3000 lines of code is repeated twice and I don't find it elegant to copy and paste all of that code twice.

I would appreciate if I could get some feedback on how to possibly solve this issue, such that, regardless of a one-line file or multiple-line file...when i proceed to evaluate, i only use one segment at a time.

Upvotes: 1

Views: 1539

Answers (2)

David Dubois
David Dubois

Reputation: 3932

There are many possible ways of doing this. Which is best for you might depend on how long these files are and how important performance is.

A simple solution is to just read characters one at a time until you hit your tilde delimiter. The routine ReadOneItem below shows how this can be done.

procedure TForm1.Button1Click(Sender: TObject);
const
  FileName = 'c:\kuiper\test2.txt';
var
  MyFile : textfile;
  Buffer : string;

  // Read one item from text file MyFile.
  // Load characters one at a time.
  // Ignore CR and LF characters
  // Stop reading at end-of-file, or when a '~' is read

  function ReadOneItem : string;
  var
    C : char;
  begin
    Result := '';

    // loop continues until break
    while true do
      begin

        // are we at the end-of-file? If so we're done
        if eof(MyFile) then
          break;

        // read in the next character
        read ( MyFile, C );

        // ignore CR and LF
        if ( C = #13 ) or ( C = #10 ) then
          {do nothing}
        else
          begin

            // add the character to the end
            Result := Result + C;

            // if this is the delimiter then stop reading
            if C = '~' then
              break;
          end;
      end;
  end;


begin
  assignfile ( MyFile, FileName );
  reset ( MyFile );
  try

    while not EOF(MyFile) do
      begin
        Buffer := ReadOneItem;
        Memo1 . Lines . Add ( Buffer );
      end;

  finally
    closefile ( MyFile );
  end;
end;

Upvotes: 1

Remy Lebeau
Remy Lebeau

Reputation: 596703

I would use a file mapping via the Win32 API CreateFileMapping() and MapViewOfFile() functions, and then just parse the raw data as-is, scanning for ~ characters and ignoring any line breaks you might encounter in between each segment. For example:

var
  hFile: THandle;
  hMapping: THandle;
  pView: Pointer;
  FileSize, I: DWORD;
  pSegmentStart, pSegmentEnd: PAnsiChar;
  sSegment: AnsiString;
begin
  hFile := CreateFile('Path\To\Text.txt', GENERIC_READ, FILE_SHARE_READ, nil, OPEN_EXISTING, 0, 0);
  if hFile = INVALID_HANDLE_VALUE then RaiseLastOSError;
  try
    FileSize := GetFileSize(hFile, nil);
    if FileSize = INVALID_FILE_SIZE then RaiseLastOSError;
    if FileSize > 0 then
    begin
      hMapping := CreateFileMapping(hFile, nil, PAGE_READONLY, 0, FileSize, nil);
      if hMapping = 0 then RaiseLastOSError;
      try
        pView := MapViewOfFile(hMapping, FILE_MAP_READ, 0, 0, FileSize);
        if pView = nil then RaiseLastOSError;
        try
          pSegmentStart := PAnsiChar(pView);
          pSegmentEnd := pSegmentStart;
          I := 0;
          while I < FileSize do
          begin
            if pSegmentEnd^ = '~' then
            begin
              SetString(sSegment, pSegmentStart, Integer(pSegmentEnd-pSegmentStart));
              // use sSegment as needed...
              pSegmentStart := pSegmentEnd + 1;
              Inc(I);
              while (I < FileSize) and (pSegmentStart^ in [#13, #10]) do
              begin
                Inc(pSegmentStart);
                Inc(I);
              end;
              pSegmentEnd := pSegmentStart;
            end else
            begin
              Inc(pSegmentEnd);
              Inc(I);
            end;
          end;
          if pSegmentEnd > pSegmentStart then
          begin
            SetString(sSegment, pSegmentStart, Integer(pSegmentEnd-pSegmentStart));
            // use sSegment as needed...
          end;
        finally
          UnmapViewOfFile(pView);
        end;
      finally
        CloseHandle(hMapping);
      end;
    end;
  finally
    CloseHandle(hFile);
  end;

Upvotes: 0

Related Questions