levi Clouser
levi Clouser

Reputation: 358

TStringList enable Binary search without resorting?

I am building a stringlist from an ADO query, in the query it is much faster to return sorted results and then add them in order. this gives me an already sorted list and then calling either Sort or setting sorted true costs me time as the Quicksort algorithm does not preform well on an already sorted list. Is there some way to set the TStringList to use the Binary search without running the sort? before you ask I don't have access to the CustomSort attribute.

Upvotes: 0

Views: 765

Answers (3)

dummzeuch
dummzeuch

Reputation: 11217

I was about to suggest using an interposer class to directly change the FSorted field without calling its setter method which as a side effect calls the Sort method. But looking at the implementation of TStringList in Delphi 2007, I found that Find will always do a binary search without checking the Sorted property. This will, of course fail, if the list items aren't sorted, but in your case they are. So, as long as you remember to call Find rather than IndexOf, you don't need to do anything.

Upvotes: 0

MartynA
MartynA

Reputation: 30715

I am not sure I understand what you are worried about, assuming the desired sort order of the StringList is the same as the ORDER BY of the AdoQuery.

Surely the thing to do is to set Sorted on your StringList to True while it is still empty and then insert the rows from the AdoQuery. That way, when the StringList is about to Add an entry, it will search for it using IndexOf, which will in turn use Find, which does a binary search, to check for duplicates. But using Add in this way does not involve a quicksort because the StringList is already treating itself as sorted.

In view of your comments and your own answer I ran the program below through the Line Timer profiler in NexusDB's Quality Suite. The result is that although there are detectable differences in execution speed using a binary search versus TStringList.IndexOf, they are nothing to do with the use (or not) of TStringList's QuickSort. Further, the difference is explicable by a subtle difference between how the binary search I used and the one in TStringList.Find work - see Update #2 below.

The program generates 200k 100-character strings and then inserts them into a StringList. The StringList is generated in two ways, first with Sorted set to True before any strings are added and then with Sorted set to True only after the strings have been added. StringList.IndexOf and your BinSearch is then used to look up each of the strings which has been added. The results are as follows:

Line    Total Time  Source
80      procedure Test;
119    0.000549 begin
120 2922.105618   StringList := GetList(True);
121 2877.101652   TestIndexOf;
122 1062.461975   TestBinSearch;
123   29.299069   StringList.Free;
124     
125 2970.756283   StringList := GetList(False);
126 2943.510851   TestIndexOf;
127 1044.146265   TestBinSearch;
128   31.440766   StringList.Free;
129             end;
130     
131     begin
132       Test;
133     end.

The problem I encountered is that your BinSearch never actually returns 1 and the number of failures is equal to the number of strings searched for. If you can fix this, I'll be happy to re-do the test.

program SortedStringList2;
[...]
const
  Rows = 200000;
  StrLen = 100;

function ZeroPad(Number : Integer; Len : Integer) : String;
begin
  Result := IntToStr(Number);
  if Length(Result) < Len then
    Result := StringOfChar('0', Len - Length(Result)) + Result;
end;

function GetList(SortWhenEmpty : Boolean) : TStringList;
var
  i : Integer;
begin
  Result := TStringList.Create;
  if SortWhenEmpty then
    Result.Sorted := True;
  for i := 1 to Rows do
    Result.Add(ZeroPad(i, StrLen));
  if not SortWhenEmpty then
    Result.Sorted := True;
end;

Function BinSearch(slList: TStringList; sToFind : String) : integer;
var
 i, j, k  : integer;
begin
  try
    i := slList.Count div 2;
    k := i;
    if i = 0 then
    begin
      Result := -1;
      // SpendLog('BinSearch List Empty, Exiting...');
      exit;
    end;

while slList.Strings[i] <> sToFind do
begin
  if CompareText(slList.Strings[i], sToFind) < 0 then
  begin
    j := i;
    k := k div 2;
    i := i + k;
    if j=i then
      break;
  end else
  if CompareText(slList.Strings[i], sToFind) > 0 then
  begin
    j := i;
    k := k div 2;
    i := i - k;
    if j=i then
      break;
  end else
    break;
end;

if slList.Strings[i] = sToFind then
  result := i
else
  Result := -1;

 except
    //SpendLog('<BinSearch> Exception: ' + ExceptionMessage + ' At Line: ' + Analysis.LastSourcePos);
 end;

end;

procedure Test;
var
  i : Integer;
  StringList : TStringList;

  procedure TestIndexOf;
  var
    i : Integer;
    Index : Integer;
    Failures : Integer;
    S : String;
  begin
    Failures := 0;
    for i := 1 to Rows do begin
      S := ZeroPad(i, StrLen);
      Index := StringList.IndexOf(S);
      if Index < 0 then
        Inc(Failures);
    end;
    Assert(Failures = 0);
  end;

  procedure TestBinSearch;
  var
    i : Integer;
    Index : Integer;
    Failures : Integer;
    S : String;
  begin
    Failures := 0;
    for i := 1 to Rows do begin
      S := ZeroPad(i, StrLen);
      Index := BinSearch(StringList, S);
      if Index < 0 then
        Inc(Failures);
    end;
    //Assert(Failures = 0);
  end;

begin
  StringList := GetList(True);
  TestIndexOf;
  TestBinSearch;
  StringList.Free;

  StringList := GetList(False);
  TestIndexOf;
  TestBinSearch;
  StringList.Free;
end;

begin
  Test;
end.

Update I wrote my own implementation of the search algorithm in the Wikipedia article https://en.wikipedia.org/wiki/Binary_search_algorithm as follows:

function BinSearch(slList: TStringList; sToFind : String) : integer;
var
  L, R, m : integer;
begin
  L := 0;
  R := slList.Count - 1;
  if R < L then begin
    Result := -1;
    exit;
  end;

  m := (L + R) div 2;
  while slList.Strings[m] <> sToFind do begin
    m := (L + R) div 2;
    if CompareText(slList.Strings[m], sToFind) < 0 then
      L := m + 1
    else
      if CompareText(slList.Strings[m], sToFind) > 0 then
        R := m - 1;
    if L > R then
      break;
  end;

  if slList.Strings[m] = sToFind then
    Result := m
  else
    Result := -1;
end;

This seems to work correctly, and re-profiling the test app using this gave these results:

Line    Total Time  Source
113     procedure Test;
153    0.000490 begin
154 3020.588894   StringList := GetList(True);
155 2892.860499   TestIndexOf;
156 1143.722379   TestBinSearch;
157   29.612898   StringList.Free;
158     
159 2991.241659   StringList := GetList(False);
160 2934.778847   TestIndexOf;
161 1113.911083   TestBinSearch;
162   30.069241   StringList.Free;

On that showing, a binary search clearly outperforms TStringList.IndexOf and contrary to my expectations it makes no real difference whether TStringList.Sorted is set to True before or after the strings are added.

Update#2 it turns out that the reason BinSearch is faster than TStringList.IndexOf is purely because BinSearch uses CompareText whereas TStringList.IndexOf uses AnsiCompareText (via .Find). If I change BinSearch to use AnsiCompareText, it becomes 1.6 times slower than TStringList.IndexOf!

Upvotes: 2

levi Clouser
levi Clouser

Reputation: 358

in the end I just hacked up a binary search to check the stringlist like an array:

Function BinSearch(slList: TStringList; sToFind : String) : integer;
var 
 i, j, k  : integer;
begin
 try
  try 
    i := slList.Count div 2;    
    k := i;
    if i = 0 then
    begin
      Result := -1;
      SpendLog('BinSearch List Empty, Exiting...');
      exit;
    end;

while slList.Strings[i] <> sToFind do
begin  
  if CompareText(slList.Strings[i], sToFind) < 0 then  
  begin    
    j := i; 
    k := k div 2;
    i := i + k;
    if j=i then
      break; 
  end else
  if CompareText(slList.Strings[i], sToFind) > 0 then 
  begin
    j := i;
    k := k div 2;
    i := i - k;
    if j=i then
      break; 
  end else
    break;
end;

if slList.Strings[i] = sToFind then
  result := i
else
  Result := -1;


except
    SpendLog('<BinSearch> Exception: ' + ExceptionMessage + ' At Line: ' + Analysis.LastSourcePos);

  end;

 finally

 end;  


end;

I'll clean this up later if needed.

Upvotes: -1

Related Questions