Reputation: 21124
In a function that reads data (data meaning exclusively strings) from disk, which should I prefer? Which is better?
A) DiskStream.Read(Pointer(s)^, Count)
or
B) DiskStream.Read(s[1], Count)
Note:
I know both are having the same result.
I know that I have to SetLength of S before calling Read.
UPDATE
S is AnsiString.
Here is the full function:
{ Reads a bunch of chars from the file. Why 'ReadChars' and not 'ReadString'? This function reads C++ strings (the length of the string was not written to disk also). So, i have to give the number of chars to read as parameter. }
function TMyStream.ReadChars(out s: AnsiString; CONST Count: Longint): Boolean;
begin
SetLength(s, Count);
Result:= Read(s[1], Count)= Count;
end;
Speed test
In my speed test the first approach was a tiny bit faster than the second one. I used a 400MB file from which I read strings about 200000 times. The process was set to High priority.
The best read time ever was:
1.35 for variant B and 1.37 for variant A.
Average:
On average, B was scoring also 20ms better than A.
The test was repeated 15 times for each variant.
The difference is really small. It could fall into the measuring error range. Probably it will be significant if I read strings more often and from a bigger file. But for the moment let's say that both lines of code are performing the same.
ANSWER
Variant A - might be a tiny tiny bit faster
Variant B - is (obviously) much more easier to read and it is more Delphi-ish. My preferred.
Note:
I have seen Embarcadero using variant A in TStreamReadBuffer example, but with a TBytes instead of String.
Upvotes: 19
Views: 1017
Reputation: 2361
If there is ever any chance that your function will be called with a Count of 0, then A) will work with Pointer(s)^
simply evaluating to nil
while B) will crash with a range check exception.
If you want to use B) and still handle counts of 0 gracefully, you should use:
function TMyStream.ReadChars(out s: AnsiString; const Count: Integer): Boolean;
begin
SetLength(s, Count);
Result := (Count = 0) or (Read(s[1], Count) = Count);
end;
Upvotes: 4
Reputation: 612794
I'd always use the second one which maintains type safety. I don't really buy the performance argument since you are about to hit the disk at worst, or file cache, or main memory, all of which are going to make a handful of CPU operations look somewhat trivial. Correctness should be given higher priority than performance.
However, I would add that this is not something that should be bothering you too much since you should write this particular piece of code once and once only. Put it in a helper class and wrap it up well. Feel free to care about optimisation, re-write it as assembler, whatever takes your fancy. But don't repeat yourself.
Upvotes: 5
Reputation: 43023
Be aware that when running
1. DiskStream.Read(Pointer(s)^, Count)
2. DiskStream.Read(s[1], Count)
The 1. version will be faster.
But you must be sure that the s
variable is explicitly local, or you have called yourself UniqueString(s)
before the loop.
Since pointer(s)^
won't call UniqueString?()
low-level hidden RTL call, it will be faster than s[1]
, but you may override some existing data if the s
string variable is shared between the current context and other context (e.g. if the last content of s
was retrieved from a function from a property value, or s
is sent as parameter to another method).
In fact the fastest correct way of coding this reading an AnsiString
from content is:
s := '';
SetLength(s,Count);
DiskStream.Read(pointer(s)^,Count);
or
SetString(s,nil,Count);
DiskStream.Read(pointer(s)^,Count);
The 2nd version being equal to the 1st, but with one line less.
Setting s
to '' will call FreeMem()+AllocMem()
instead of ReallocMem()
in SetLength()
, so will avoid a call to move()
, and will be therefore a bit faster.
In fact, the UniqueString?()
RTL call generated by s[1]
will be very fast, since you have already called SetLength()
before calling it: therefore, s
is already unique, and UniqueString?()
RTL call will return almost immediately. After profiling, there is not much speed difference between the two versions: almost all time is spend in string allocation and content moving from disk. Perhaps s[1]
is found to be more "pascalish".
Upvotes: 14
Reputation: 1826
The second one (DiskStream.Read(s[1], Count)). Whenever you encounter an untyped var parameter it reads like "take the address of what is passed as a parameter". So in this case you are passing the address of the first character of the string s, which is what you intended to do.
Upvotes: 1
Reputation: 27493
If you care about optimization you should prefer the first variant. Just look at the code generated by compiler:
Unit7.pas.98: Stream.Read(Pointer(S)^, 10);
00470EA9 8B55FC mov edx,[ebp-$04]
00470EAC B90A000000 mov ecx,$0000000a
00470EB1 8BC6 mov eax,esi
00470EB3 8B18 mov ebx,[eax]
00470EB5 FF530C call dword ptr [ebx+$0c]
Unit7.pas.99: Stream.Read(s[1], 10);
00470EB8 8B5DFC mov ebx,[ebp-$04]
00470EBB 85DB test ebx,ebx
00470EBD 7418 jz $00470ed7
00470EBF 8BC3 mov eax,ebx
00470EC1 83E80A sub eax,$0a
00470EC4 66833802 cmp word ptr [eax],$02
00470EC8 740D jz $00470ed7
00470ECA 8D45FC lea eax,[ebp-$04]
00470ECD 8B55FC mov edx,[ebp-$04]
00470ED0 E8CB3FF9FF call @InternalUStrFromLStr
00470ED5 8BD8 mov ebx,eax
00470ED7 8D45FC lea eax,[ebp-$04]
00470EDA E89950F9FF call @UniqueStringU
00470EDF 8BD0 mov edx,eax
00470EE1 B90A000000 mov ecx,$0000000a
00470EE6 8BC6 mov eax,esi
00470EE8 8B18 mov ebx,[eax]
00470EEA FF530C call dword ptr [ebx+$0c]
UPDATE
The above code is generated by Delphi 2009 compiler. You can improve the code by using {$STRINGCHECKS OFF} directive, but you still have UniqueStringU
function call overhead:
Unit7.pas.100: Stream.Read(s[1], 10);
00470EB8 8D45FC lea eax,[ebp-$04]
00470EBB E8B850F9FF call @UniqueStringU
00470EC0 8BD0 mov edx,eax
00470EC2 B90A000000 mov ecx,$0000000a
00470EC7 8BC3 mov eax,ebx
00470EC9 8B18 mov ebx,[eax]
00470ECB FF530C call dword ptr [ebx+$0c]
Upvotes: 7
Reputation: 16602
The second option is definitely more "Delphi style" (if you look at the Delphi versions of the Windows API headers, you will see that most pointer parameters have been converted to var
parameters).
In addition to that, the second option does not need a cast and is much more readable IMHO.
Upvotes: 6
Reputation: 84540
Definitely the array notation. Part of Delphi style is to make your code easy to read, and it's easier to tell what's going on when you spell out exactly what you're doing. Casting a string to a pointer and then dereferencing it looks confusing; why are you doing that? It doesn't make sense unless the reader knows a lot about string internals.
Upvotes: 18