Reputation: 1989
I am working on a particular scenario, where I have to read from a Text File, parse it, extract meaningful information from it, perform SQL queries with the information and then produce a reponse, output file.
I have about 3000 lines of code. Everything is working as expected. However I have been thinking of a connendrum that could possibly dissrupt my project.
The text file being read (lets call it Text.txt) may consist of a single line or multiple lines.
In my case, a 'line' is identified by its segment name - say ISA, BHT, HB, NM1, etc... each segment ending is identified by a special character '~'.
Now if the file consists of multiple lines (such that each line corresponds to a single segment); say:-
ISA....... ~
NM1....... ~
DMG....... ~
SE........ ~
and so on.... then my code essentially reads each 'line' (i.e. each segment), one at a time and stores it into a temp buffer using the following command :-
ReadLn(myFile,buffer);
and then performs evaluations based on each line. Produces the desired output. No problems.
However the issue is... what if the file consists of only a single line (consisting of multiple segments), represented as:-
ISA....... ~NM1....... ~DMG....... ~SE........ ~
then with my ReadLine command I read the entire line instead of each segment, one at a time. This doesn't work for my code.
I was thinking about creating an if, else statement pair...which is based on how many lines my Txt.txt file consists of..such as:-
if line = 1:- then extract each segment at a time...seperated by the special character '~' perform necessary tasks (3000 lines of code) else if line > 1:- then extract each line at a time (corresponding to each segment) perform necessary tasks (3000 lines of code).
now the 3000 lines of code is repeated twice and I don't find it elegant to copy and paste all of that code twice.
I would appreciate if I could get some feedback on how to possibly solve this issue, such that, regardless of a one-line file or multiple-line file...when i proceed to evaluate, i only use one segment at a time.
Upvotes: 1
Views: 1539
Reputation: 3932
There are many possible ways of doing this. Which is best for you might depend on how long these files are and how important performance is.
A simple solution is to just read characters one at a time until you hit your tilde delimiter. The routine ReadOneItem below shows how this can be done.
procedure TForm1.Button1Click(Sender: TObject);
const
FileName = 'c:\kuiper\test2.txt';
var
MyFile : textfile;
Buffer : string;
// Read one item from text file MyFile.
// Load characters one at a time.
// Ignore CR and LF characters
// Stop reading at end-of-file, or when a '~' is read
function ReadOneItem : string;
var
C : char;
begin
Result := '';
// loop continues until break
while true do
begin
// are we at the end-of-file? If so we're done
if eof(MyFile) then
break;
// read in the next character
read ( MyFile, C );
// ignore CR and LF
if ( C = #13 ) or ( C = #10 ) then
{do nothing}
else
begin
// add the character to the end
Result := Result + C;
// if this is the delimiter then stop reading
if C = '~' then
break;
end;
end;
end;
begin
assignfile ( MyFile, FileName );
reset ( MyFile );
try
while not EOF(MyFile) do
begin
Buffer := ReadOneItem;
Memo1 . Lines . Add ( Buffer );
end;
finally
closefile ( MyFile );
end;
end;
Upvotes: 1
Reputation: 596703
I would use a file mapping via the Win32 API CreateFileMapping()
and MapViewOfFile()
functions, and then just parse the raw data as-is, scanning for ~
characters and ignoring any line breaks you might encounter in between each segment. For example:
var
hFile: THandle;
hMapping: THandle;
pView: Pointer;
FileSize, I: DWORD;
pSegmentStart, pSegmentEnd: PAnsiChar;
sSegment: AnsiString;
begin
hFile := CreateFile('Path\To\Text.txt', GENERIC_READ, FILE_SHARE_READ, nil, OPEN_EXISTING, 0, 0);
if hFile = INVALID_HANDLE_VALUE then RaiseLastOSError;
try
FileSize := GetFileSize(hFile, nil);
if FileSize = INVALID_FILE_SIZE then RaiseLastOSError;
if FileSize > 0 then
begin
hMapping := CreateFileMapping(hFile, nil, PAGE_READONLY, 0, FileSize, nil);
if hMapping = 0 then RaiseLastOSError;
try
pView := MapViewOfFile(hMapping, FILE_MAP_READ, 0, 0, FileSize);
if pView = nil then RaiseLastOSError;
try
pSegmentStart := PAnsiChar(pView);
pSegmentEnd := pSegmentStart;
I := 0;
while I < FileSize do
begin
if pSegmentEnd^ = '~' then
begin
SetString(sSegment, pSegmentStart, Integer(pSegmentEnd-pSegmentStart));
// use sSegment as needed...
pSegmentStart := pSegmentEnd + 1;
Inc(I);
while (I < FileSize) and (pSegmentStart^ in [#13, #10]) do
begin
Inc(pSegmentStart);
Inc(I);
end;
pSegmentEnd := pSegmentStart;
end else
begin
Inc(pSegmentEnd);
Inc(I);
end;
end;
if pSegmentEnd > pSegmentStart then
begin
SetString(sSegment, pSegmentStart, Integer(pSegmentEnd-pSegmentStart));
// use sSegment as needed...
end;
finally
UnmapViewOfFile(pView);
end;
finally
CloseHandle(hMapping);
end;
end;
finally
CloseHandle(hFile);
end;
Upvotes: 0