Reputation: 510
I know how to remove duplicate strings from a TStringList using dupignore for a sorted Tstringlist.
CallData := TStringList.Create;
CallData.Sorted := True;
Call.Duplicates := dupIgnore;
But in my case strings must not be sorted .
Using a FOR loop finding duplicates is very slow (also using indexOF())when TStringList has hundreds of thousands of lines .
if OpenDialog1.Execute then
begin
Try
y := TStringList.create;
f := TStreamReader.create(OpenDialog1.FileName, TEncoding.UTF8, True);
while not f.EndOfStream do
begin
l := f.ReadLine;
X.Add(l);
end;
g := Tstreamwriter.create('d:\logX.txt', True, TEncoding.UTF8);
for I := 0 to X.count - 1 do
begin
if y.IndexOf(X[I]) = -1 then
y.Add(X[I]);
end;
for j := 0 to y.count - 1 do
g.WriteLine(y[j]);
Finally
f.free;
y.free;
g.free;
End;
end;
is there any better way ?
Upvotes: 7
Views: 3182
Reputation: 612924
Here's how I would approach this problem:
If there are a large number of duplicates to be removed then the performance of the above will be affected by repeated removal from the string list. That's because each item to be removed results in the later items being shifted down one index. You can avoid this by copying into a new list rather than deleting inplace.
Alternatively, you can operate in place like this:
Count
to zero.Count
of the list, and then increment Count
.Count
elements.The point of the dictionary is that lookup is an O(1) operation and so the second algorithm has O(n) time complexity.
Upvotes: 6
Reputation: 80013
function compareobjects
(list : Tstringlist;
index1 : integer;
index2 : integer
) : integer;
begin
if index1 = index2 then
result := 0
else
if integer(list.objects[index1]) < integer(list.objects[index2]) then
result := -1
else
result := 1;
end;
begin
Try
y := TStringList.create;
y.Sorted := true;
y.Duplicates := dupignore;
f := TStreamReader.create('c:\106x\q47780823.bat');
i := 0;
while not f.EndOfStream do
begin
inc(i);
line := f.readline;
y.Addobject(line,tobject(i));
end;
y.Sorted := false;
y.CustomSort(compareobjects);
for i := 0 to y.count - 1 do
WriteLn(y[i]);
Finally
f.free;
y.free;
End;
readln;
end.
I'd keep track of the line number (i
) and assign it with the string by casting as an object; sort the list and remove duplicates as before, but then un-sort it using a custom sort on the objects.
Upvotes: 1
Reputation: 6013
I would use trickery, by having a sorted and an unsorted list. Like this:
y := TStringList.create;
s := TStringList.create;
s.Sorted := TRUE;
s.Duplicates := dupIgnore;
f := TStreamReader.create(OpenDialog1.FileName, TEncoding.UTF8, True);
while not f.EndOfStream do
begin
l := f.ReadLine;
s.Add(l);
if s.Count > y.Count then y.Add(l);
end;
// etc.
Upvotes: 3