Reputation: 83
We often replace non-desirable characters in a file with another "good" character.
The interface is:
procedure cleanfileASCII2(vfilename: string; vgood: integer; voutfilename: string);
To replace all non-desirables with a space we might call, cleanfileASCII2(original.txt, 32 , cleaned.txt)
The problem is that this takes a rather long time. Is there a better way to do it than shown?
procedure cleanfileASCII2(vfilename: string; vgood: integer; voutfilename:
string);
var
F1, F2: file of char;
Ch: Char;
tempfilename: string;
i,n,dex: integer;
begin
//original
AssignFile(F1, vfilename);
Reset(F1);
//outputfile
AssignFile(F2,voutfilename);
Rewrite(F2);
while not Eof(F1) do
begin
Read(F1, Ch);
//
n:=ord(ch);
if ((n<32)or(n>127))and (not(n in [10,13])) then
begin // bad char
if vgood<> -1 then
begin
ch:=chr(vgood);
Write(F2, Ch);
end
end
else //good char
Write(F2, Ch);
end;
CloseFile(F2);
CloseFile(F1);
end;
Upvotes: 3
Views: 4620
Reputation: 1
Probably the easiest method would be:
vote this post +1 if it helped please
Upvotes: 0
Reputation: 884
The problem has to do with how you're treating the buffer. Memory transfers are the most expensive part of any operation. In this case, you're looking at the file byte by byte. By changing to a blockread or buffered read, you will realize an enormous increase in speed. Note that the correct buffer size varies based on where you are reading from. For a networked file, you will find extremely large buffers may be less efficient due to the packet size TCP/IP imposes. Even this has become a bit murky with large packets from gigE but, as always, the best result is to benchmark it.
I converted from standard reads to a file stream just for convenience. You could easily do the same thing with a blockread. In this case, I took a 15MB file and ran it through your routine. It took 131,478ms to perform the operation on a local file. With the 1024 buffer, it took 258ms.
procedure cleanfileASCII3(vfilename: string; vgood: integer; voutfilename:string);
const bufsize=1023;
var
inFS, outFS:TFileStream;
buffer: array[0..bufsize] of byte;
readSize:integer;
tempfilename: string;
i: integer;
begin
if not FileExists(vFileName) then exit;
inFS:=TFileStream.Create(vFileName,fmOpenRead);
inFS.Position:=0;
outFS:=TFileStream.Create(vOutFileName,fmCreate);
while not (inFS.Position>=inFS.Size) do
begin
readSize:=inFS.Read(buffer,sizeof(buffer));
for I := 0 to readSize-1 do
begin
n:=buffer[i];
if ((n<32)or(n>127)) and (not(n in [10,13])) and (vgood<>-1) then
buffer[i]:=vgood;
end;
outFS.Write(buffer,readSize);
end;
inFS.Free;
outFS.Free;
end;
Upvotes: 6
Reputation: 391704
Several improvements:
here's a stab, that is untested (no compiler in front of me right now):
procedure cleanfileASCII2(vfilename: string; vgood: integer; voutfilename: string);
var
f1, f2: File;
table: array[Char] of Char;
index, inBuffer: Integer;
buffer: array[0..2047] of Char;
c: Char;
begin
for c := #0 to #31 do
table[c] := ' ';
for c := #32 to #127 do
table[c] := c;
for c := #128 to #255 do
table[c] := ' ';
table[#10] := #10; // exception to spaces <32
table[#13] := #13; // exception to spaces <32
AssignFile(F1, vfilename);
Reset(F1, 1);
AssignFile(F2,voutfilename);
Rewrite(F2, 1);
while not Eof(F1) do
begin
BlockRead(f1, buffer, SizeOf(buffer), inBuffer);
for index := 0 to inBuffer - 1 do
buffer[index] := table[buffer[index]];
BlockWrite(f2, buffer, inBuffer);
end;
Close(f2);
Close(f1);
end;
Upvotes: 2
Reputation: 43158
Buffering is the correct way to do that. I modified your code to see the difference:
procedure cleanfileASCII2(vfilename: string; vgood: integer; voutfilename:
string);
var
F1, F2: file;
NumRead, NumWritten: Integer;
Buf: array[1..2048] of Char;
Ch: Char;
i, n: integer;
begin
AssignFile(F1, vfilename);
Reset(F1, 1); // Record size = 1
AssignFile(F2, voutfilename);
Rewrite(F2, 1); // Record size = 1
repeat
BlockRead(F1, Buf, SizeOf(Buf), NumRead);
for i := 1 to NumRead do
begin
Ch := Buf[i];
//
n := ord(ch);
if ((n<32)or(n>127))and (not(n in [10,13])) then
begin // bad char
if vgood <> -1 then
begin
ch := chr(vgood);
Buf[i] := Ch;
end
//else //good char
//Write(F2, Ch);
end;
end;
BlockWrite(F2, Buf, NumRead, NumWritten);
until (NumRead = 0) or (NumWritten <> NumRead);
CloseFile(F1);
CloseFile(F2);
end;
Upvotes: 1
Reputation: 172
Don't try to optimize without know where.
You shoud use the Sampling Profiler (delphitools.info) to know where is the bottleneck. It's easy to use.
Precompute the vgood chr conversion, before the loop.
Also, You don't need some conversions: Ord() and Chr(). Use always the 'Ch' variable.
if not (ch in [#10, #13, #32..#127]) then
Upvotes: 0
Reputation: 13454
I did it this way, ensuring that the file I/O is done all in one go before the processing. The code could do with updating for unicode but it copes with nasty text characters such as nulls and gives you a TStrings capability. Bri
procedure TextStringToStringsAA( AStrings : TStrings; const AStr: Ansistring);
// A better routine than the stream 'SetTextStr'.
// Nulls (#0) which might be in the file e.g. from corruption in log files
// do not terminate the reading process.
var
P, Start, VeryEnd: PansiChar;
S: ansistring;
begin
AStrings.BeginUpdate;
try
AStrings.Clear;
P := Pansichar( AStr );
VeryEnd := P + Length( AStr );
if P <> nil then
while P < VeryEnd do
begin
Start := P;
while (P < VeryEnd) and not CharInSet(P^, [#10, #13]) do
Inc(P);
SetString(S, Start, P - Start);
AStrings.Add(string(S));
if P^ = #13 then Inc(P);
if P^ = #10 then Inc(P);
end;
finally
AStrings.EndUpdate;
end;
end;
procedure TextStreamToStrings( AStream : TStream; AStrings : TStrings );
// An alternative to AStream.LoadFromStream
// Nulls (#0) which might be in the file e.g. from corruption in log files
// do not terminate the reading process.
var
Size : Integer;
S : Ansistring;
begin
AStrings.BeginUpdate;
try
// Make a big string with all of the text
Size := AStream.Size - AStream.Position;
SetString( S, nil, Size );
AStream.Read(Pointer(S)^, Size);
// Parse it
TextStringToStringsAA( AStrings, S );
finally
AStrings.EndUpdate;
end;
end;
procedure LoadStringsFromFile( AStrings : TStrings; const AFileName : string );
// Loads this strings from a text file
// Nulls (#0) which might be in the file e.g. from corruption in log files
// do not terminate the reading process.
var
ST : TFileStream;
begin
ST := TFileStream.Create( AFileName, fmOpenRead + fmShareDenyNone);
// No attempt is made to prevent other applications from reading from or writing to the file.
try
ST.Position := 0;
AStrings.BeginUpdate;
try
TextStreamToStrings( ST, AStrings );
finally
AStrings.EndUpdate;
end;
finally
ST.Free;
end;
end;
Upvotes: 0
Reputation: 4440
You could buffer your input and output so you read a chunk of characters (even the whole file, if it's not too big) into an array, then process the array, then write the entire array to the output file.
In most of these cases, the disk IO is the bottleneck, and if you can do fewer large reads instead of many small reads, it will be faster.
Upvotes: 1