Reputation: 1436
I have a graph ( adjacency_list (listS, vecS, bidirectionalS, VertexVal) ) in which I need to delete 100,000+ nodes. Each node also contains a structure of 2 64-bit integers and another 64-bit integer. The guid check that happens in the code below is checking 1st integer in the structure.
On my laptop ( i7 2.7GHz, 16GB RAM ) it takes about 88 seconds according to VTune.
Following is how I delete the nodes:
vertex_iterator vi,vi_end;
boost::tie(vi, vi_end) = boost::vertices(m_graph);
while (vi!=vi_end) {
if (m_graph[*vi].guid.part1 == 0) {
boost::remove_vertex(*vi,m_graph);
boost::tie(vi, vi_end) = boost::vertices(m_graph);
} else
++vi;
}
Vtune shows that the boost::remove_vertex() call takes 88.145 seconds. Is there a more efficient way to delete these vertices?
Upvotes: 5
Views: 1487
Reputation: 1436
I was able to successfully serialize the graph using Boost serialization routines into a string, parse the string and remove the nodes I didn't need and de-serialize the modified string. For 200,000 total nodes in graph and 100,000 that needs to be deleted I was able to successfully finish the operation in less than 2 seconds.
For my particular use-case each vertex has 3 64bit integers. When it needs to be deleted, I mark 2 of those integers as 0s. A valid vertex would never have a 0. When the point comes to clean up the graph - to delete the "deleted" vertices, I follow the above logic.
In the code below removeDeletedNodes() does the string parsing and removing the vertices and mapping the edge numbers.
Upvotes: 3
Reputation: 392911
In your removal branch you re-tie()
the iterators:
boost::tie(vi, vi_end) = boost::vertices(m_graph);
This will cause the loop to restart every time you restart the loop. This is exactly Schlemiel The Painter.
I'll find out whether you can trust remove_vertex
not triggering a reallocation. If so, it's easily fixed. Otherwise, you'd want an indexer-based loop instead of iterator-based. Or you might be able to work on the raw container (it's a private member, though, as I remember).
Update Using vecS
as the container for vertices is going to cause bad performance here:
If the
VertexList
template parameter of theadjacency_list
wasvecS
, then all vertex descriptors, edge descriptors, and iterators for the graph are invalidated by this operation. <...> If you need to make frequent use of theremove_vertex()
function thelistS
selector is a much better choice for theVertexList
template parameter.
This small benchmark test.cpp
compares:
with -DSTABLE_IT
(listS
)
$ ./stable
Generated 100000 vertices and 5000 edges in 14954ms
The graph has a cycle? false
starting selective removal...
Done in 0ms
After: 99032 vertices and 4916 edges
without -DSTABLE_IT
(vecS
)
$ ./unstable
Generated 100000 vertices and 5000 edges in 76ms
The graph has a cycle? false
starting selective removal...
Done in 396ms
After: 99032 vertices and 4916 edges
using filtered_graph
(thanks @cv_and_he
in the comments)
Generated 100000 vertices and 5000 edges in 15ms
The graph has a cycle? false
starting selective removal...
Done in 0ms
After: 99032 vertices and 4916 edges
Done in 13ms
You can clearly see that removal is much faster for listS
but generating is much slower.
Upvotes: 9
Reputation: 16290
It would be interesting to see more of the Vtune data.
My experience has been that the default Microsoft allocator can be a big bottleneck when deleting tens of thousands of small objects. Does your Vtune graph show a lot of time in delete or free?
If so, consider switching to a third-party allocator. Nedmalloc is said to be good: http://www.nedprod.com/programs/portable/nedmalloc/
Google has one, tcmalloc, which is very well regarded and much faster than the built-in allocators on almost every platform. https://code.google.com/p/gperftools/ tcmalloc is not a drop-in for Windows.
Upvotes: 0