Reputation: 2990
I am working on a graph with 875713 nodes
and 5105039 edges
. Using vector<bitset<875713>> vec(875713)
or array<bitset<875713>, 875713>
throws a segfault at me. I need to calculate all-pair-shortest-paths with path recovery. What alternative data structures do I have?
I found this SO Thread but it doesn't answer my query.
EDIT
I tried this after reading the suggestions, seems to work. Thanks everyone for helping me out.
vector<vector<uint>> neighboursOf; // An edge between i and j exists if
// neighboursOf[i] contains j
neighboursOf.resize(nodeCount);
while (input.good())
{
uint fromNodeId = 0;
uint toNodeId = 0;
getline(input, line);
// Skip comments in the input file
if (line.size() > 0 && line[0] == '#')
continue;
else
{
// Each line is of the format "<fromNodeId> [TAB] <toNodeId>"
sscanf(line.c_str(), "%d\t%d", &fromNodeId, &toNodeId);
// Store the edge
neighboursOf[fromNodeId].push_back(toNodeId);
}
}
Upvotes: 2
Views: 3113
Reputation: 7348
You could store lists of edges per node in a single array. If the number of edges per node is variable you can terminate the lists with a null edge. This will avoid the space overhead for many small lists (or similar data structures). The result could look like this:
enum {
MAX_NODES = 875713,
MAX_EDGES = 5105039,
};
int nodes[MAX_NODES+1]; // contains index into array edges[].
// index zero is reserved as null node
// to terminate lists.
int edges[MAX_EDGES+MAX_NODES]; // contains null terminated lists of edges.
// each edge occupies a single entry in the
// array. each list ends with a null node.
// there are MAX_EDGES entries and MAX_NODES
// lists.
[...]
/* find edges for node */
int node, edge, edge_index;
for (edge_index=nodes[node]; edges[edge_index]; edge_index++) {
edge = edges[edge_index];
/* do something with edge... */
}
Minimizing the space overhead is very important since you have a huge number of small data structures. The overhead for each list of nodes is just one integer, this is much less than the overhead of e.g. a stl vector. Also the lists are continuously layed out in memory, which means that there is no wasted space between any two lists. With variable sized vectors this will not be the case.
Reading all edges for any given node will be very fast because the edges for any node are stored continuously in memory.
The downside of this data arrangement is that when you initialize the arrays and construct the edge lists, you need to have all the edges for a node at hand. This is not a problem if you get the edges sorted by node, but does not work well if the edges are in random order.
Upvotes: 2
Reputation: 15725
Your graph is sparse, that is, |E| << |V|^2
, so you should probably either use a sparse matrix to represent your adjacency matrix, or equivalently, store for each node a list of its neighbors (which is results in a jagged array), like this -
vector<vector<int> > V (number_of_nodes);
// For each cell of V, which is a vector itself, push only the indices of adjacent nodes.
V[0].push_back(2); // Node number 2 is a neighbor of node number 0
...
V[number_of_nodes-1].push_back(...);
This way, your expected memory requirements are O(|E| + |V|)
instead of O(|V|^2)
, which in your case should be around 50 MB instead of a gazzillion MB.
This will also result in a faster Dijkstra (or any other shortest-path algorithm) since you only need to consider the neighbors of a node at each step.
Upvotes: 4
Reputation: 1105
If we declare a Node as below:
struct{
int node_id;
vector<int> edges; //all the edges starts from this Node.
} Node;
Then all the nodes can be expressed as below:
array<Node> nodes;
Upvotes: 1