Reputation: 107
I have two arrays. Array1 is holding medical terminologies while array2 is holding plain english terminologies. The medical terminology array has some non-medical english words in it (which are duplicated in array1). These are 20,000 in array1 and over 15000 in array2. What is the fastest way of comparing the two arrays and remove the duplicate words from array1 using delphi?
Upvotes: 0
Views: 1561
Reputation: 4622
If the array is unsorted, this approach is faster: 1. stuff all words from array2 into a dictionary. 2. run through array1, look up each word in your dictionary and remove it if found.
Step 1 and 2 both take O(n) time.
If you have an older version of delphi, you would have to use a dictionary class like a THashedStringList which is in the MemIni.Pas file of your delphi installation. It degenerates for very large N but for like 20.000 entries it is still extremely fast.
A very fast implementation capable of storing and finding 100 of millions of strings in O(1) can be found here. The article is german, but the code is in english and understandable.
Upvotes: 2
Reputation: 1175
If your arrays are sorted, keep two pointers-- p1
and p2
to array1
and array2
resply. Initialize
them to point to the first element of array1
and array2
.
For each entry in array1
starting from the first, see whether it is the word array2[p2].
If it is, simply remove it from array1
. If it isn't,
increment p2
until
( (array1[p1] >= array2[p2]) and (array1[p1] < array2[p2+1]) ),
removing array1[p1]
if you found a hit.
Takes O(n)
time. There are sorting algorithms in O(nlogn)
time.
Upvotes: 2