Kshitij Banerjee
Kshitij Banerjee

Reputation: 1748

Parallel processing - Connected Data

Problem

Summary: Parallely apply a function F to each element of an array where F is NOT thread safe.

I have a set of elements E to process, lets say a queue of them. I want to process all these elements in parallel using the same function f( E ).

Now, ideally I could call a map based parallel pattern, but the problem has the following constraints.

What is the right way of doing this?

My thoughts are like so,

  1. trivial thought: Keep a set of active As and Bs, and start processing an Element only when no other thread is already using A OR B.
    • So, when you give the element to a thread you add the As and Bs to the active set.
    • Pick the first element, if its elements are not in the active set spawn a new thread , otherwise push it to the back of the queue of elements.
    • Do this till the queue is empty.
    • Will this cause a deadlock ? Ideally when a processing is over some elements will become available right?

2.-The other thought is to make a graph of these connected objects. Each node represents an object (A / B) . Each element is an edge connecting A & B, and then somehow process the data such that we know the elements are never overlapping.

Questions

Upvotes: 0

Views: 52

Answers (1)

Daniel
Daniel

Reputation: 1051

The "best" approach depends on a lot of factors here:

  1. How many elements "E" do you have and how much work is needed for f(E). --> Check if it's really worth it to work the elements in parallel (if you need a lot of locking and don't have much work to do, you'll probably slow down the process by working in parallel)
  2. Is there any possibility to change the design that can make f(E) multi-threading safe?
  3. How many elements "A" and "B" are there? Is there any logic to which elements "E" share specific versions of A and B? --> If you can sort the elements E into separate lists where each A and B only appears in a single list, then you can process these lists parallel without any further locking.
  4. If there are many different A's and B's and you don't share too many of them, you may want to do a trivial approach where you just lock each "A" and "B" when entering and wait until you get the lock.

Whenever you do "lock and wait" with multiple locks it's very important that you always take the locks in the same order (e.g. always A first and B second) because otherwise you may run into deadlocks. This locking order needs to be observed everywhere (a single place in the whole application that uses a different order can cause a deadlock)

Edit: Also if you do "try lock" you need to ensure that the order is always the same. Otherwise you can cause a lifelock:

  1. thread 1 locks A
  2. thread 2 locks B
  3. thread 1 tries to lock B and fails
  4. thread 2 tries to lock A and fails
  5. thread 1 releases lock A
  6. thread 2 releases lock B
  7. Goto 1 and repeat...

Chances that this actually happens "endless" are relatively slim, but it should be avoided anyway

Edit 2: principally I guess I'd just split E(Ax, Bx) into different lists based on Ax (e.g one list for all E's that share the same A). Then process these lists in parallel with locking of "B" (there you can still "TryLock" and continue if the required B is already used.

Upvotes: 1

Related Questions