Reputation: 358
I am building a stringlist from an ADO query, in the query it is much faster to return sorted results and then add them in order. this gives me an already sorted list and then calling either Sort or setting sorted true costs me time as the Quicksort algorithm does not preform well on an already sorted list. Is there some way to set the TStringList to use the Binary search without running the sort? before you ask I don't have access to the CustomSort attribute.
Upvotes: 0
Views: 765
Reputation: 11217
I was about to suggest using an interposer class to directly change the FSorted field without calling its setter method which as a side effect calls the Sort method. But looking at the implementation of TStringList in Delphi 2007, I found that Find will always do a binary search without checking the Sorted property. This will, of course fail, if the list items aren't sorted, but in your case they are. So, as long as you remember to call Find rather than IndexOf, you don't need to do anything.
Upvotes: 0
Reputation: 30715
I am not sure I understand what you are worried about, assuming the desired sort order of the StringList is the same as the ORDER BY of the AdoQuery.
Surely the thing to do is to set Sorted on your StringList to True while it is still empty and then insert the rows from the AdoQuery. That way, when the StringList is about to Add an entry, it will search for it using IndexOf, which will in turn use Find, which does a binary search, to check for duplicates. But using Add in this way does not involve a quicksort because the StringList is already treating itself as sorted.
In view of your comments and your own answer I ran the program below through the Line Timer profiler in NexusDB's Quality Suite. The result is that although there are detectable differences in execution speed using a binary search versus TStringList.IndexOf
, they are nothing to do with the use (or not) of TStringList
's QuickSort. Further, the difference is explicable by a subtle difference between how the binary search I used and the one in TStringList.Find
work - see Update #2 below.
The program generates 200k 100-character strings and then inserts them into a StringList. The StringList is generated in two ways, first with Sorted set to True before any strings are added and then with Sorted set to True only after the strings have been added. StringList.IndexOf
and your BinSearch
is then used to look up each of the strings which has been added. The results are as follows:
Line Total Time Source
80 procedure Test;
119 0.000549 begin
120 2922.105618 StringList := GetList(True);
121 2877.101652 TestIndexOf;
122 1062.461975 TestBinSearch;
123 29.299069 StringList.Free;
124
125 2970.756283 StringList := GetList(False);
126 2943.510851 TestIndexOf;
127 1044.146265 TestBinSearch;
128 31.440766 StringList.Free;
129 end;
130
131 begin
132 Test;
133 end.
The problem I encountered is that your BinSearch
never actually returns 1
and the number of failures is equal to the number of strings searched for. If you can fix this, I'll be happy to re-do the test.
program SortedStringList2;
[...]
const
Rows = 200000;
StrLen = 100;
function ZeroPad(Number : Integer; Len : Integer) : String;
begin
Result := IntToStr(Number);
if Length(Result) < Len then
Result := StringOfChar('0', Len - Length(Result)) + Result;
end;
function GetList(SortWhenEmpty : Boolean) : TStringList;
var
i : Integer;
begin
Result := TStringList.Create;
if SortWhenEmpty then
Result.Sorted := True;
for i := 1 to Rows do
Result.Add(ZeroPad(i, StrLen));
if not SortWhenEmpty then
Result.Sorted := True;
end;
Function BinSearch(slList: TStringList; sToFind : String) : integer;
var
i, j, k : integer;
begin
try
i := slList.Count div 2;
k := i;
if i = 0 then
begin
Result := -1;
// SpendLog('BinSearch List Empty, Exiting...');
exit;
end;
while slList.Strings[i] <> sToFind do
begin
if CompareText(slList.Strings[i], sToFind) < 0 then
begin
j := i;
k := k div 2;
i := i + k;
if j=i then
break;
end else
if CompareText(slList.Strings[i], sToFind) > 0 then
begin
j := i;
k := k div 2;
i := i - k;
if j=i then
break;
end else
break;
end;
if slList.Strings[i] = sToFind then
result := i
else
Result := -1;
except
//SpendLog('<BinSearch> Exception: ' + ExceptionMessage + ' At Line: ' + Analysis.LastSourcePos);
end;
end;
procedure Test;
var
i : Integer;
StringList : TStringList;
procedure TestIndexOf;
var
i : Integer;
Index : Integer;
Failures : Integer;
S : String;
begin
Failures := 0;
for i := 1 to Rows do begin
S := ZeroPad(i, StrLen);
Index := StringList.IndexOf(S);
if Index < 0 then
Inc(Failures);
end;
Assert(Failures = 0);
end;
procedure TestBinSearch;
var
i : Integer;
Index : Integer;
Failures : Integer;
S : String;
begin
Failures := 0;
for i := 1 to Rows do begin
S := ZeroPad(i, StrLen);
Index := BinSearch(StringList, S);
if Index < 0 then
Inc(Failures);
end;
//Assert(Failures = 0);
end;
begin
StringList := GetList(True);
TestIndexOf;
TestBinSearch;
StringList.Free;
StringList := GetList(False);
TestIndexOf;
TestBinSearch;
StringList.Free;
end;
begin
Test;
end.
Update I wrote my own implementation of the search algorithm in the Wikipedia article https://en.wikipedia.org/wiki/Binary_search_algorithm as follows:
function BinSearch(slList: TStringList; sToFind : String) : integer;
var
L, R, m : integer;
begin
L := 0;
R := slList.Count - 1;
if R < L then begin
Result := -1;
exit;
end;
m := (L + R) div 2;
while slList.Strings[m] <> sToFind do begin
m := (L + R) div 2;
if CompareText(slList.Strings[m], sToFind) < 0 then
L := m + 1
else
if CompareText(slList.Strings[m], sToFind) > 0 then
R := m - 1;
if L > R then
break;
end;
if slList.Strings[m] = sToFind then
Result := m
else
Result := -1;
end;
This seems to work correctly, and re-profiling the test app using this gave these results:
Line Total Time Source
113 procedure Test;
153 0.000490 begin
154 3020.588894 StringList := GetList(True);
155 2892.860499 TestIndexOf;
156 1143.722379 TestBinSearch;
157 29.612898 StringList.Free;
158
159 2991.241659 StringList := GetList(False);
160 2934.778847 TestIndexOf;
161 1113.911083 TestBinSearch;
162 30.069241 StringList.Free;
On that showing, a binary search clearly outperforms TStringList.IndexOf
and contrary to my expectations it makes no real difference whether TStringList.Sorted
is set to True before or after the strings are added.
Update#2 it turns out that the reason BinSearch
is faster than TStringList.IndexOf
is purely because BinSearch
uses CompareText
whereas TStringList.IndexOf
uses AnsiCompareText
(via .Find
). If I change BinSearch
to use AnsiCompareText
, it becomes 1.6 times slower than TStringList.IndexOf
!
Upvotes: 2
Reputation: 358
in the end I just hacked up a binary search to check the stringlist like an array:
Function BinSearch(slList: TStringList; sToFind : String) : integer;
var
i, j, k : integer;
begin
try
try
i := slList.Count div 2;
k := i;
if i = 0 then
begin
Result := -1;
SpendLog('BinSearch List Empty, Exiting...');
exit;
end;
while slList.Strings[i] <> sToFind do
begin
if CompareText(slList.Strings[i], sToFind) < 0 then
begin
j := i;
k := k div 2;
i := i + k;
if j=i then
break;
end else
if CompareText(slList.Strings[i], sToFind) > 0 then
begin
j := i;
k := k div 2;
i := i - k;
if j=i then
break;
end else
break;
end;
if slList.Strings[i] = sToFind then
result := i
else
Result := -1;
except
SpendLog('<BinSearch> Exception: ' + ExceptionMessage + ' At Line: ' + Analysis.LastSourcePos);
end;
finally
end;
end;
I'll clean this up later if needed.
Upvotes: -1