JayX
JayX

Reputation: 1784

Finding the largest element in a list with K processes using Erlang?

It is easy to implement the algorithm using a single process, however, how can I use multiple processes to do the job?

Here is what I have done so far.

 find_largest([H], _) -> H;
 find_largest([H, Q | T], R) ->
     if H > Q -> find_largest([H | T], [Q | R]);
       true -> find_largest([Q | T], [H | R])
 end.

Thanks

Upvotes: 4

Views: 586

Answers (2)

David Weldon
David Weldon

Reputation: 64312

The single process version of your code should be replaced by lists:max/1. A useful function for parallelizing code is as follows:

pmap(Fun, List) ->
    Parent = self(),
    P = fun(Elem) ->
            Ref = make_ref(),
            spawn_link(fun() -> Parent ! {Ref, Fun(Elem)} end),
            Ref
        end,
    Refs = [P(Elem) || Elem <- List],
    lists:map(fun(Ref) -> receive {Ref, Elem} -> Elem end end, Refs).

pmap/2 applies Fun to each member of List in parallel and collects the results in input order. To use pmap with this problem, you would need to segment your original list into a list of lists and pass that to pmap. e.g. lists:max(pmap(fun lists:max/1, ListOfLists)). Of course, the act of segmenting the lists would be more expensive than simply calling lists:max/1, so this solution would require that the list be pre-segmented. Even then, it's likely that the overhead of copying the lists outweighs any benefit of parallelization - especially on a single node.

The inherent problem with your situation is that the computation of each sub-task is tiny when compared with the overhead of managing the data. Tasks which are more computationally intensive, (e.g. factoring a list of large numbers), are more easily parallelized.

This isn't to say that finding a max value can't be parallelized, but I believe it would require that your data be pre-segmented or segmented in a way that didn't require iterating over every value.

Upvotes: 2

YOUR ARGUMENT IS VALID
YOUR ARGUMENT IS VALID

Reputation: 2059

Given how Erlang represents lists, this is probably not a good idea to try and do in parallel. Partitioning the list implies a lot of copying (since they are linked lists) and so does sending these partitions to other processes. I expect the comparison to be far cheaper than copying everything twice and then combining the results.

The implementation is also not correct, you can find a good one in lists.erl as max/1

%% max(L) -> returns the maximum element of the list L

-spec max([T,...]) -> T.

max([H|T]) -> max(T, H).

max([H|T], Max) when H > Max -> max(T, H);
max([_|T], Max)              -> max(T, Max);
max([],    Max)              -> Max.

If by some chance your data are already in separate processes, simply get the lists:max/1 or each of the lists and send them to a single place, and then get the lists:max/1 of the result list. You could also do the comparison as you receive the results to avoid building this intermediate list.

Upvotes: 5

Related Questions