Rakesh Agarwal
Rakesh Agarwal

Reputation: 3059

Parallel For Loops: Find if a sorted array contains duplicate elements

It's a huge array that contains elements in the ascending order. We need to find if the array contains any duplicate elements. The brute force approach is simple that when we traverse the loop and if for any given index i, a[i] == a[i+1], we can early return indicating that the array contain duplicate elements.

However, in a multi-core environment, we can definitely improve the performance by having multiple for loops running in parallel, each working on a section of the input loop. How do we achieve synchronization there? How would early return work?

Upvotes: 0

Views: 289

Answers (1)

NathanOliver
NathanOliver

Reputation: 180710

You'll have to check if the performance of having a synchronization variable that tells the loops to end early is better then just letting the threads run through the subset of the array and returning if they find something. If you don't have a lock free atomic bool then letting the threads run to the end of the subset and then synchronizing will probably be faster.

The algorithm itself is pretty easy, you just need to make sure you handle the corner case. Lets say your data set is

1, 2, 3, 4, 4, 6, 7, 8

and you split it up into 4 even threads. The threads would get

thread 1: 1, 2
thread 2: 3, 4
thread 3: 4, 6
thread 4: 7, 8

and each thread would not find a duplicate since the duplicate entries span two different threads. What that means is that each thread needs to have a subset that overlaps the other subsets by an element so you can find the duplicate. That means the threads should really be laid out like

thread 1: 1, 2, 3
thread 2: 3, 4, 4
thread 3: 4, 6, 7
thread 4: 7, 8 // the last thread does need to overlap anything

Having that you can run through each subset lineally checking a[i] == a[i+1] and you are guaranteed to find a duplicate if one exists.


You don't need to worry about synchronizing the read operation on the contents of the vector. As long as the vector is not modified while you are checking it, it is okay to have multiple readers reading from it at the same time. You only need synchronization when you have more than one thread and one or more of those threads writes to shared data.

Upvotes: 4

Related Questions