MATLAB program takes more than 1 hour to execute

Question

The below program is a program for finding k-clique communities from a input graph. The graph dataset can be found here.

The first line of the dataset contains 'number of nodes and edges' respectively. The following lines have 'node1 node2' representing an edge between node1 and node2 . For example:

2500 6589    // number_of_nodes, number_of_edges    
0 5          // edge between node[0] and node[5]    
.    
.    
.

The k-clique( aCliqueSIZE, anAdjacencyMATRIX ) function is contained here.

The following commands are executed in command window of MATLAB:

x         = textread( 'amazon.graph.small' );            %% source input file text 
s         = max(x(1,1), x(1,2));                         %% take largest dimemsion
adjMatrix = sparse(x(2:end,1)+1, x(2:end,2)+1, 1, s, s); %% now matrix is square    
adjMatrix = adjMatrix | adjMatrix.';                     %% apply "or" with transpose to make symmetric
adjMatrix = full(adjMatrix);                             %% convert to full if needed
k=4;
[X,Y,Z]=k_clique(k,adjMatrix);                                   %% 

% The output can be viewed by the following commands    
celldisp(X);    
celldisp(Y);    
Z

The above program takes more than 1 hour to execute whereas I think this shouldn't be the case. While running the program on windows, I checked the task manager and found that only 500 MB is allocated for the program. Is this the reason for the slowness of the program? If yes, then how can I allocate more heap memory (close to 4GB) to this program in MATLAB?

user3666197 · Accepted Answer

The problem does not seem to be Memory-bound

Having a sparse, square, symmetric matrix of 6k5 * 6k5 edges does not mean a big memory.

The provided code has many for loops and is heavily recursive in the tail function transfer_nodes()

Add a "Stone-Age-Profiler" into the code

To show the respective times spent on a CPU-bound sections of the processing, wrap the main sections of the code into a construct of: tic(); for .... end;toc() which will print you the CPU-bound times spent on relevent sections of the k_clique.m code, showing the readings "on-the-fly"

Your original code k_clique.m

function [components,cliques,CC] = k_clique(k,M)
% k-clique algorithm for detecting overlapping communities in a network 
% as defined in the paper "Uncovering the overlapping 
% community structure of complex networks in nature and society"
% 
% [X,Y,Z] = k_clique(k,A)
% 
% Inputs: 
% k - clique size 
% A - adjacency matrix
% 
% Outputs: 
% X - detected communities
% Y - all cliques (i.e. complete subgraphs that are not parts of larger
% complete subgraphs)
% Z - k-clique matrix


nb_nodes = size(M,1); % number of nodes 

% Find the largest possible clique size via the degree sequence:
% Let {d1,d2,...,dk} be the degree sequence of a graph. The largest
% possible clique size of the graph is the maximum value k such that
% dk >= k-1
degree_sequence = sort(sum(M,2) - 1,'descend');
%max_s = degree_sequence(1);
max_s = 0;
for i = 1:length(degree_sequence)
    if degree_sequence(i) >= i - 1
        max_s = i;
    else 
        break;
    end
end

cliques = cell(0);
% Find all s-size kliques in the graph
for s = max_s:-1:3
    M_aux = M;
    % Looping over nodes
    for n = 1:nb_nodes
        A = n; % Set of nodes all linked to each other
        B = setdiff(find(M_aux(n,:)==1),n); % Set of nodes that are linked to each node in A, but not necessarily to the nodes in B
        C = transfer_nodes(A,B,s,M_aux); % Enlarging A by transferring nodes from B
        if ~isempty(C)
            for i = size(C,1)
                cliques = [cliques;{C(i,:)}];
            end
        end
        M_aux(n,:) = 0; % Remove the processed node
        M_aux(:,n) = 0;
    end
end

% Generating the clique-clique overlap matrix
CC = zeros(length(cliques));
for c1 = 1:length(cliques)
    for c2 = c1:length(cliques)
        if c1==c2
            CC(c1,c2) = numel(cliques{c1});
        else
            CC(c1,c2) = numel(intersect(cliques{c1},cliques{c2}));
            CC(c2,c1) = CC(c1,c2);
        end
    end
end

% Extracting the k-clique matrix from the clique-clique overlap matrix
% Off-diagonal elements <= k-1 --> 0
% Diagonal elements <= k --> 0
CC(eye(size(CC))==1) = CC(eye(size(CC))==1) - k;
CC(eye(size(CC))~=1) = CC(eye(size(CC))~=1) - k + 1;
CC(CC >= 0) = 1;
CC(CC < 0) = 0;

% Extracting components (or k-clique communities) from the k-clique matrix
components = [];
for i = 1:length(cliques)
    linked_cliques = find(CC(i,:)==1);
    new_component = [];
    for j = 1:length(linked_cliques)
        new_component = union(new_component,cliques{linked_cliques(j)});
    end
    found = false;
    if ~isempty(new_component)
        for j = 1:length(components)
            if all(ismember(new_component,components{j}))
                found = true;
            end
        end
        if ~found
            components = [components; {new_component}];
        end
    end
end


    function R = transfer_nodes(S1,S2,clique_size,C)
        % Recursive function to transfer nodes from set B to set A (as
        % defined above)

        % Check if the union of S1 and S2 or S1 is inside an already found larger
        % clique 
        found_s12 = false;
        found_s1 = false;
        for c = 1:length(cliques)
            for cc = 1:size(cliques{c},1)
                if all(ismember(S1,cliques{c}(cc,:)))
                    found_s1 = true;
                end
                if all(ismember(union(S1,S2),cliques{c}(cc,:)))
                    found_s12 = true;
                    break;
                end
            end
        end

        if found_s12 || (length(S1) ~= clique_size && isempty(S2))
            % If the union of the sets A and B can be included in an
            % already found (larger) clique, the recursion is stepped back
            % to check other possibilities
            R = [];
        elseif length(S1) == clique_size;
            % The size of A reaches s, a new clique is found
            if found_s1
                R = [];
            else
                R = S1;
            end
        else
            % Check the remaining possible combinations of the neighbors
            % indices
            if isempty(find(S2>=max(S1),1))
                R = [];
            else
                R = [];
                for w = find(S2>=max(S1),1):length(S2)
                    S2_aux = S2;
                    S1_aux = S1;
                    S1_aux = [S1_aux S2_aux(w)];
                    S2_aux = setdiff(S2_aux(C(S2(w),S2_aux)==1),S2_aux(w));
                    R = [R;transfer_nodes(S1_aux,S2_aux,clique_size,C)];
                end
            end
        end
    end
end

MATLAB program takes more than 1 hour to execute

Answers (1)

The problem does not seem to be Memory-bound

Add a "Stone-Age-Profiler" into the code

Related Questions