Dan Nissenbaum
Dan Nissenbaum

Reputation: 13908

Passing arrays, without overhead (preferably "by reference"), to avoid duplicating complex code blocks, in matlab?

I have complex code blocks, in a Matlab script, that act on large, non-sparse arrays. The code performs many write operations to random elements in the arrays, as well as read operations. The identical code must execute against different (large) arrays (i.e., the same code blocks, except for different array variable names).

I do not want to have long, duplicated code blocks that differ only in the array names.

Unfortunately, when I create a function to perform the operations, so that the code block appears only once, the performance slows down by a factor of 10 or more (presumably due to the copying of the array). However, I do not need the array copied. I would prefer to "pass by reference", so that the purpose of the function call is ONLY to avoid having duplicated code blocks. There seems to be no way to avoid the copy-on-write semantics, however.

Also, it is impossible (so far as I understand) to create a script (not a function) to achieve this, because the script must contain identical variable names as the calling script, so I would need a different script for every array on which I wish to run the script, which gains nothing (I still would have duplicated code blocks).

I have looked into creating an alias variable name to "substitute" for the array variable name of interest, in which case I could call a script and avoid duplicated code. However, I cannot find any way to create an alias in Matlab.

Finally, I have attempted writing a function that utilizes the evalin() function, and passing the string name of the array variable to this function, but although this works, the performance is also vastly slower - about the same as passing the arrays by value to a function (at least a 10 times decay in performance).

I am coming to the conclusion that it is impossible in Matlab to avoid duplicating code blocks when performing complex operations on non-sparse arrays, in the effort to avoid the ghastly overhead that Matlab seems to present using any possible technique of avoiding duplicated code blocks.

I find this hard to believe, but I cannot find a way around it.

Does anybody know of a way to avoid duplicated code blocks when performing identical intricate operations on multiple non-sparse arrays in Matlab?

Upvotes: 8

Views: 3686

Answers (5)

VBel
VBel

Reputation: 241

Another answer:

There is a good article In-place Operations on Data. Apparently, there may be two pitfalls:

  1. (this is trivial and you probably did it) You should use the same in and out variable name not only in the definition of the function, but also where you call it.
  2. This only work if you call your function from ANOTHER FUNCTION, not from a command line. Weird... I tried, and, though there is an overhead, it is very small (for 10000-by-10000 arrays it was 1 sec from a command line and 0.000361 sec from another function).

If this does not work for you, you may use an undocumented feature that allows you do in-place operation in C++ MEX file. This is nasty, but here is an article just about that: Matlab mex in-place editing

Upvotes: 2

angainor
angainor

Reputation: 11810

As noted by Loren on his blog, MATLAB does support in-line operations on matrices, which essentially covers passing arrays by reference, modifying them in a function, and returning the result. You seem to know that, but you erroneously state that because the script must contain identical variable names as the calling script. Here is code example that shows this is wrong. When testing, please copy it verbatim and save as a function:

function inplace_test
y = zeros(1,1e8);
x = zeros(1,1e8);

tic; x = compute(x); toc
tic; y = compute(y); toc
tic; x = computeIP(x); toc
tic; y = computeIP(y); toc
tic; x = x+1; toc
end

function x=computeIP(x)
x = x+1;
end

function y=compute(x)
y = x+1;
end

Time results on my computer:

Elapsed time is 0.243335 seconds.
Elapsed time is 0.251495 seconds.
Elapsed time is 0.090949 seconds.
Elapsed time is 0.088894 seconds.
Elapsed time is 0.090638 seconds.

As you see, the two last calls that use an in-place function are equally fast for both input arrays x and y. Also, they are equally fast as running x = x+1 without a function. The only important thing is that inside the function input and output parameters are the same. And there is one more thing...

If I should guess what is wrong with your code, I'd say you made nested functions that you expect to be in-place. And they are not. So the below code will not work:

function inplace_test
y = zeros(1,1e8);
x = zeros(1,1e8);

tic; x = compute(x); toc
tic; y = compute(y); toc
tic; x = computeIP(x); toc
tic; y = computeIP(y); toc
tic; x = x+1; toc

    function x=computeIP(x)
        x = x+1;
    end

    function y=compute(x)
        y = x+1;
    end
end

Elapsed time is 0.247798 seconds.
Elapsed time is 0.257521 seconds.
Elapsed time is 0.229774 seconds.
Elapsed time is 0.237215 seconds.
Elapsed time is 0.090446 seconds.

The bottom line - be careful with those nested functions..

Upvotes: 9

grantnz
grantnz

Reputation: 7423

The handle solution suggested by Brian L does work although the first call that modifies the wrapped data does take a long time (because it has to make a copy of the original data).

Try this:

SomeData.m

classdef SomeData < handle
    properties        
            X
    end
    methods                
        function obj = SomeData(x)            
            if nargin > 0
                obj.X = x;
            else
                obj.X = [];
            end
        end
    end
end

LargeOp.m

function directArray = LargeOp( someData, directArray )
    if nargin > 1
        directArray(1,1) = rand(1);
    else
        someData.X(1,1) = rand(1);
        directArray = [];    
    end
end

Script to test performance

large = zeros(10000,10000);

data = SomeData(large);

tic
LargeOp(data);
toc

tic
large = LargeOp(data,large);
toc

tic
LargeOp(data);
toc

tic
large = LargeOp(data,large);
toc

Results

Elapsed time is 0.364589 seconds.
Elapsed time is 0.450668 seconds.
Elapsed time is 0.001073 seconds.
Elapsed time is 0.443150 seconds.

Upvotes: 2

VBel
VBel

Reputation: 241

You may try to put all of your arrays into a single cell array and use index on it, instead of referring by names. Function will still copy the arrays, but script can do the job.

Upvotes: 4

drhagen
drhagen

Reputation: 9532

Depending on your needs, you can accomplish this by making a nested function.

function A = evensarenegative(n)
    A = zeros(n,1);

    for i = 1:n
        if mod(i,2)
            nested1(i)
        else
            nested2(i)
        end
    end

    function nested1(i)
        A(i) = i;
    end

    function nested2(i)
        A(i) = -i;
    end
end

Here, the functions share the same workspace, in particular the A matrix, so no variables are ever copied. I find it to be a convenient way to organize code, especially when I have a lot of minor (but possibly verbose) operations as part of a larger workflow.

Upvotes: 1

Related Questions