Rickson
Rickson

Reputation: 1090

Clearing selected variables from MATLAB's workspace in a performant way

I have a question regarding increasing the performance of clearing no longer required variables from MATLAB's workspace.

Assume that some variables are present in the workspace that shall be kept. Those variables can be stored as follows:

VarsToKeep = who

Also assume that after storing the variables, lots of new variables are generated. From time to time, I need to clear those newly generated variables which I currently do as follows:

eval(['clearvars -except ' cell2str(VarsToKeep,' ')]); 

However, this operation seems to take quite long (more than 5s in my case).

Since performance is a major issue in my environment, I would like to know whether there is a more performant MATLAB command for that operation.

Upvotes: 2

Views: 441

Answers (1)

Suever
Suever

Reputation: 65430

Rather than using eval you can simply use the function syntax () to call clearvars and pass your VarsToKeep to it using {:} indexing to create a comma-separated list.

clearvars('-except', VarsToKeep{:});

As far as why it is slow, it really depends on how many variables that you are removing. If you have more variables it is going to take longer to clear.

Update

As you've stated in the comments, you have close to 500 variables that you are trying to prevent from being cleared. If we look at the internals of clearvars, there are a few reasons why specifying so many variables to exclude is slow.

Regular Expressions

Internally, clearvars creates a regular expression to determine which variables to keep. In your case, you have specified all of the variable names explicitly so it must concatenate them into a big regular expression.

If for example, we wanted to keep the variables A, B, and C, this regular expression would look something like:

'^(?!(A|B|C)$).'

And this regular expression basically only matches things that are not our variables of interest:

regexp({'A', 'AA', 'B', 'AB', 'C', 'D'}, '^(?!(A|B|C)$).');

%    []    [1]    []    [1]    []    [1]

This regular expression is just passed to the built-in clear to do the actual clearing in the following way:

clear -regexp '^(?!(A|B|C)$).'

clear then has to go compare every variable to this regular expression to determine whether to remove it or not.

Now it's important to remember that regular expressions aren't the most computationally friendly thing and it only gets worse as they grow more and more complex. As you add variables to exclude, this regular expression keeps growing bigger. Also, this regular expression has to be evaluated for each variable in your workspace so things add up quickly.

Just as a proof of concept, if you had 25 variables of 25 characters each, the regex would look like this

^(?!(onhwbcwijwjjoxyowepmnjeac|jkowjywrerpfamjpdtcisttpy|qtaihttmztryenwyfdzhnunsw|fyhvhmybvbqulietxwitalcjd|noeszudvzieizcbvpraycicnt|gkhdpwticanasbjfyrgjytzlp|nfrrgwghhalhlzawaqtqzdxkd|ritwzxekjcctmyooeuoutufod|sfuimpzzcavgxyuhqhbrttrjn|zquelkgrexmgbogtzegyineay|qyjuxjkfkpnluafyownikibtv|xxoprkwzvrkkvcozvvlhhlaft|nkvuoaxsiztuixtbmmbdaoijb|hdsdopqyndjsbuvefvkcxzohl|pzlitikbyysnpwewzraiifmgi|zfucjwulrnzluxqyohsdmophc|gbdiiftvfbsqoregmmzpemadw|rjdfzlznmshxpvbvqhcwhsuud|ekbluwjwevpgsjbnqjvzybxul|jfmgqyvomuhprelxnolizptxn|iyhnvmdyvrenfhpsmfawqvqga|jcfbtajkidnimopxmawzblfmq|yyccqfoftjqmcbeainaeweeyk|jelyftqgcqkepnpyzdkrpqpam|mbucicotugqiksqkpgryhzwev)$).

If you want to benchmark how long just the regular expression part takes, you can do the following:

% Construct a regular expression of all of your variables
regex = sprintf('^(?!(%s)$).', strjoin(VarsToKeep, '|'));

% Now match all variables to this regex.
matches = regexp(VarsToKeep, regex);

This explains why even if you are keeping all variables in your workspace it's still terribly slow, because MATLAB still has to construct this giant regular expression and compare it to every variable only to find that you excluded all of them.

Note that this is just the overhead that doesn't actually include clearing the underlying data.

An Alternative

Rather than using all of this regular expression matching, it will likely be faster to get a list of variables before and after you run your code, then use setdiff or ismember to determine the ones that were added and then clear these explicitly with clear.

% Keep track of the variables before we started
beforeVars = who;

% Do stuff

% Get the list of variables after we're done
afterVars = who;

% Figure out which ones were added
toRemove = afterVars(~ismember(afterVars, beforeVars));

% Now clear these variables explicitly (no regular expressions involved)
clear(toRemove{:})

This is still going to take a long time if you have large variables defined in your workspace, but at least you aren't wasting much time identifying which variables to remove.

A Benchmark for Good Measure

I was actually curious what kind of performance I got, so I designed a quick little benchmark to do the test. Essentially I create N variables in the global workspace (with random names of a specific length) and assign them all a random scalar value (they shouldn't take any time to clear). Then I apply the two methodologies for removing half of them.

function testclear
    % Range of sizes to test (I don't have all day so I only tested 5)
    nVars = round(linspace(1, 500, 5));

    times1 = zeros(1, numel(nVars));
    times2 = zeros(1, numel(nVars));

    for n = 1:numel(nVars)

        %% TEST THE CLEARVARS WAY

        % Now create twice as many variables (we will clear half)
        createVariables(2 * nVars(n));

        % Now we're going to clear half the variables and time it
        tic
        evalin('base', 'clearvars(''-except'', W{1:ceil(numel(W)/2)})');
        times1(n) = toc;

        % Now clear everything for the next run
        evalin('base', 'clear(W{:})');

        %% EXPLICITLY PASS TO CLEAR

        createVariables(2 * nVars(n));

        evalin('base', 'beforeVars = W(1:ceil(numel(W)/2));')
        evalin('base', 'afterVars = W((ceil(numel(W)/2) + 1):end);')

        tic
        evalin('base', 'toRemove = afterVars(~ismember(afterVars, beforeVars));');
        evalin('base', 'clear(toRemove{:});')
        times2(n) = toc;

        % Now clear everything for the next run
        evalin('base', 'clear(W{:})');
    end

    figure;
    plot(nVars, times1, nVars, times2);
    xlabel('Number of Variables to Keep')
    ylabel('Execution Time (sec)')
    legend({'clearvars -except', 'clear'});
end

function createVariables(N)
    for k = 1:N
        % Create a random variable name
        varname = randsample('a':'z', 25, 1);

        % Assign that variable within the workspace
        evalin('base', [varname, '= rand(1);']);
    end

    % Get a list of all variables
    evalin('base', 'W=who;');

    % Add 'W' to the list so it doesn't get cleared
    evalin('base', 'W = [''W''; W];');
end

enter image description here

Well....I think this graph speaks for itself on which is faster.

I ran a similar test like you proposed where all variables were in the -except list and it yielded similar results.

Upvotes: 4

Related Questions