Reputation: 1090
I have a question regarding increasing the performance of clearing no longer required variables from MATLAB's workspace.
Assume that some variables are present in the workspace that shall be kept. Those variables can be stored as follows:
VarsToKeep = who
Also assume that after storing the variables, lots of new variables are generated. From time to time, I need to clear those newly generated variables which I currently do as follows:
eval(['clearvars -except ' cell2str(VarsToKeep,' ')]);
However, this operation seems to take quite long (more than 5s in my case).
Since performance is a major issue in my environment, I would like to know whether there is a more performant MATLAB command for that operation.
Upvotes: 2
Views: 441
Reputation: 65430
Rather than using eval
you can simply use the function syntax ()
to call clearvars
and pass your VarsToKeep
to it using {:}
indexing to create a comma-separated list.
clearvars('-except', VarsToKeep{:});
As far as why it is slow, it really depends on how many variables that you are removing. If you have more variables it is going to take longer to clear.
As you've stated in the comments, you have close to 500 variables that you are trying to prevent from being cleared. If we look at the internals of clearvars
, there are a few reasons why specifying so many variables to exclude is slow.
Internally, clearvars
creates a regular expression to determine which variables to keep. In your case, you have specified all of the variable names explicitly so it must concatenate them into a big regular expression.
If for example, we wanted to keep the variables A
, B
, and C
, this regular expression would look something like:
'^(?!(A|B|C)$).'
And this regular expression basically only matches things that are not our variables of interest:
regexp({'A', 'AA', 'B', 'AB', 'C', 'D'}, '^(?!(A|B|C)$).');
% [] [1] [] [1] [] [1]
This regular expression is just passed to the built-in clear
to do the actual clearing in the following way:
clear -regexp '^(?!(A|B|C)$).'
clear
then has to go compare every variable to this regular expression to determine whether to remove it or not.
Now it's important to remember that regular expressions aren't the most computationally friendly thing and it only gets worse as they grow more and more complex. As you add variables to exclude, this regular expression keeps growing bigger. Also, this regular expression has to be evaluated for each variable in your workspace so things add up quickly.
Just as a proof of concept, if you had 25 variables of 25 characters each, the regex would look like this
^(?!(onhwbcwijwjjoxyowepmnjeac|jkowjywrerpfamjpdtcisttpy|qtaihttmztryenwyfdzhnunsw|fyhvhmybvbqulietxwitalcjd|noeszudvzieizcbvpraycicnt|gkhdpwticanasbjfyrgjytzlp|nfrrgwghhalhlzawaqtqzdxkd|ritwzxekjcctmyooeuoutufod|sfuimpzzcavgxyuhqhbrttrjn|zquelkgrexmgbogtzegyineay|qyjuxjkfkpnluafyownikibtv|xxoprkwzvrkkvcozvvlhhlaft|nkvuoaxsiztuixtbmmbdaoijb|hdsdopqyndjsbuvefvkcxzohl|pzlitikbyysnpwewzraiifmgi|zfucjwulrnzluxqyohsdmophc|gbdiiftvfbsqoregmmzpemadw|rjdfzlznmshxpvbvqhcwhsuud|ekbluwjwevpgsjbnqjvzybxul|jfmgqyvomuhprelxnolizptxn|iyhnvmdyvrenfhpsmfawqvqga|jcfbtajkidnimopxmawzblfmq|yyccqfoftjqmcbeainaeweeyk|jelyftqgcqkepnpyzdkrpqpam|mbucicotugqiksqkpgryhzwev)$).
If you want to benchmark how long just the regular expression part takes, you can do the following:
% Construct a regular expression of all of your variables
regex = sprintf('^(?!(%s)$).', strjoin(VarsToKeep, '|'));
% Now match all variables to this regex.
matches = regexp(VarsToKeep, regex);
This explains why even if you are keeping all variables in your workspace it's still terribly slow, because MATLAB still has to construct this giant regular expression and compare it to every variable only to find that you excluded all of them.
Note that this is just the overhead that doesn't actually include clearing the underlying data.
Rather than using all of this regular expression matching, it will likely be faster to get a list of variables before and after you run your code, then use setdiff
or ismember
to determine the ones that were added and then clear these explicitly with clear
.
% Keep track of the variables before we started
beforeVars = who;
% Do stuff
% Get the list of variables after we're done
afterVars = who;
% Figure out which ones were added
toRemove = afterVars(~ismember(afterVars, beforeVars));
% Now clear these variables explicitly (no regular expressions involved)
clear(toRemove{:})
This is still going to take a long time if you have large variables defined in your workspace, but at least you aren't wasting much time identifying which variables to remove.
I was actually curious what kind of performance I got, so I designed a quick little benchmark to do the test. Essentially I create N
variables in the global workspace (with random names of a specific length) and assign them all a random scalar value (they shouldn't take any time to clear). Then I apply the two methodologies for removing half of them.
function testclear
% Range of sizes to test (I don't have all day so I only tested 5)
nVars = round(linspace(1, 500, 5));
times1 = zeros(1, numel(nVars));
times2 = zeros(1, numel(nVars));
for n = 1:numel(nVars)
%% TEST THE CLEARVARS WAY
% Now create twice as many variables (we will clear half)
createVariables(2 * nVars(n));
% Now we're going to clear half the variables and time it
tic
evalin('base', 'clearvars(''-except'', W{1:ceil(numel(W)/2)})');
times1(n) = toc;
% Now clear everything for the next run
evalin('base', 'clear(W{:})');
%% EXPLICITLY PASS TO CLEAR
createVariables(2 * nVars(n));
evalin('base', 'beforeVars = W(1:ceil(numel(W)/2));')
evalin('base', 'afterVars = W((ceil(numel(W)/2) + 1):end);')
tic
evalin('base', 'toRemove = afterVars(~ismember(afterVars, beforeVars));');
evalin('base', 'clear(toRemove{:});')
times2(n) = toc;
% Now clear everything for the next run
evalin('base', 'clear(W{:})');
end
figure;
plot(nVars, times1, nVars, times2);
xlabel('Number of Variables to Keep')
ylabel('Execution Time (sec)')
legend({'clearvars -except', 'clear'});
end
function createVariables(N)
for k = 1:N
% Create a random variable name
varname = randsample('a':'z', 25, 1);
% Assign that variable within the workspace
evalin('base', [varname, '= rand(1);']);
end
% Get a list of all variables
evalin('base', 'W=who;');
% Add 'W' to the list so it doesn't get cleared
evalin('base', 'W = [''W''; W];');
end
Well....I think this graph speaks for itself on which is faster.
I ran a similar test like you proposed where all variables were in the -except
list and it yielded similar results.
Upvotes: 4