Reputation: 2325
I have the following code that goes through a matrix of 188k x 188k rows of data and attempts to create a network graph out of it. The problem here is my algorithm is extremely slow (as expected since its quadratic). Is there a better way of doing this that I'm not seeing? I'm already thinking of using openMP to parallelize this but would be great if I don't have to.
Here's whats true about my matrix - its symmetric, its over 188 thousand by 188 thousand, each value in the matrix corresponds to the edge weight So for example, an element aij is the weight of the edge between i and j. Here's my code:
The graph creation:
typedef boost::adjacency_list
<
boost::vecS,
boost::vecS,
boost::undirectedS,
boost::property<boost::vertex_name_t, std::string>,
boost::property<boost::edge_weight_t, float>,
boost::property<boost::graph_name_t, std::string>
> UGraph;
typedef UGraph::vertex_descriptor vertex_t;
typedef UGraph::edge_descriptor edge_t;
Now the function creating the network:
vertex_t u;
vertex_t v;
edge_t e;
bool found=0;
int idx =0;
float cos_similarity;
for(int p =1;p<=adj_matrix.cols();p++){
//using a previously created vector to track already created nodes
if(std::find(created_nodes.begin(), created_nodes.end(), nodes[idx]) == created_nodes.end()){
u = add_vertex(nodes[idx], ug);
created_nodes.push_back(nodes[idx]);
}else{
u = vertex(p,ug);
}
int jdx = 0;
for(int q =1;q<=adj_matrix.cols();q++){
if(p!=q){//NO LOOPS IN THIS GRAPH
//using a previously created vector to track already created nodes
if(std::find(created_nodes.begin(), created_nodes.end(), nodes[jdx]) == created_nodes.end()){
v = add_vertex(nodes[jdx], ug);
created_nodes.push_back(nodes[jdx]);
}else{
u = vertex(q,ug);
}
tie(e, found) = edge(u, v, ug);
if(!found){//check that edge does not already exist
cos_similarity = adj_matrix(p,q);
fil<<cos_similarity<<endl;
fil.flush();
if(cos_similarity >= 0.2609){ //only add edge if value of cell is greater than this threshold
boost::add_edge(u,v,cos_similarity, ug);
edge_out<<p<<" "<<q<<" "<<cos_similarity<<endl; //creating an edge-weight list for later use
}
}
}
jdx++;
}
idx++;
}
Upvotes: 2
Views: 127
Reputation: 56
A simple tip:
I think your algorithm is cubic rather than quadratic, because vector and std::find(vector.begin(), vector.end()) are used to avoid duplicates in the inside loop.
To avoid duplicates and keep the algorithm quadraic, you can just traverse the upper triangle of the matrix as it's symmetric, which means the graph is an undirected graph.
Upvotes: 4