user1817376
user1817376

Reputation: 107

How to compare two arrays in delphi and remove duplicates from one

I have two arrays. Array1 is holding medical terminologies while array2 is holding plain english terminologies. The medical terminology array has some non-medical english words in it (which are duplicated in array1). These are 20,000 in array1 and over 15000 in array2. What is the fastest way of comparing the two arrays and remove the duplicate words from array1 using delphi?

Upvotes: 0

Views: 1561

Answers (2)

alzaimar
alzaimar

Reputation: 4622

If the array is unsorted, this approach is faster: 1. stuff all words from array2 into a dictionary. 2. run through array1, look up each word in your dictionary and remove it if found.

Step 1 and 2 both take O(n) time.

If you have an older version of delphi, you would have to use a dictionary class like a THashedStringList which is in the MemIni.Pas file of your delphi installation. It degenerates for very large N but for like 20.000 entries it is still extremely fast.

A very fast implementation capable of storing and finding 100 of millions of strings in O(1) can be found here. The article is german, but the code is in english and understandable.

Upvotes: 2

ashley
ashley

Reputation: 1175

If your arrays are sorted, keep two pointers-- p1 and p2 to array1 and array2 resply. Initialize them to point to the first element of array1 and array2.

For each entry in array1 starting from the first, see whether it is the word array2[p2]. If it is, simply remove it from array1. If it isn't, increment p2 until

( (array1[p1] >= array2[p2]) and (array1[p1] < array2[p2+1]) ),  

removing array1[p1] if you found a hit.

Takes O(n) time. There are sorting algorithms in O(nlogn) time.

Upvotes: 2

Related Questions