Bob John
Bob John

Reputation: 467

How is a graph tyically defined?

I saw an interview question that had the graph structure defined as:

struct Node{
   vector<Node*> neighbors;
}

I thought this was unusual, or perhaps a mistake, since there is nothing to really differentiate between nodes. Is my reasoning correct, or can a graph be properly defined with just a vector of neighbors? I was thinking we would have to have something like this:

template<typename T>
struct Node{
   T value;
   vector<Node*> neighbors;
}

Which intuitively makes more sense to me.

Is there any "typical" way to define a graph? For example, with a binary tree we'd have (at minimum) a value, plus left and right pointers. With a linked list we'd have (at minimum) a value and a next pointer, etc.

Upvotes: 4

Views: 1144

Answers (5)

Adam27X
Adam27X

Reputation: 883

In the high-performance computing literature graphs are typically represented using the Compressed Sparse Row (CSR) format, which is also popular for sparse-matrix calculations. Many real-world graphs (road maps, social networks, etc.) are sparse, making adjacency matrices wasteful in terms of space. For sparse graphs, CSR format is preferable in terms of performance to adjacency lists because it requires fewer fetches to memory.

I wrote a blog post that includes a sample picture of a graph and its representation in this format for NVIDIA's Parallel Forall blog. See the section on "Sparse Graph Representation on the GPU." It turns out many CPU-based HPC graph algorithms also use this format.

To summarize that post, take a look at the following graph:


(source: nvidia.com)

(Ignore the BC[x] = y labels)

If we renumber the vertices to index from 0...8 rather than 1...9, we can represent the graph as follows in CSR format:


(source: nvidia.com)

where R is the row-offsets array and C is the column-indices array. The row-offsets array is an n+1 element array that points at where each vertex's adjacency begins and ends within the column indices array. For example, the adjacency list of a vertex u is located from C[R[u]] to C[R[u+1]-1], inclusively.

So for the above example if we look at vertex 4 in the picture of the graph, which we have renumbered to vertex 3, we can see that R[3] = 8 and R[4] = 12, which means the vertices adjacent to 3 are located from C[8] through C[12], or are vertices {0,2,4,5}, corresponding to {1,3,5,6} as shown in the picture.

Upvotes: 1

kamoor
kamoor

Reputation: 2939

This is pretty normal way of representing graph, especially in object oriented languages. It is very scalable too to represent millions of nodes connected each other. You can also represent graph using adjacency matrix which is good to learn algorithms and implement sample code. But this is not great way to represent million of nodes.

Between two data structures you provided, I would say second one is more practical because you always want to store some value in each node. But there are situations you don't have anything store, in that scenario you can very well use first data structure also. For example a bi-partite graph to solve max flow problem.

So answer would "based on the problem you are trying to solve".

Any problem solving using adjacency matrix can be represented by the first data structure provided. Data structures which require more than just vertices and edges are usually fall under more advanced "Augmented Data Structures" which is required to solve majority of the real world problems.

Some more references:

  1. Flow Network
  2. Adjacency List

Upvotes: 3

Dietmar K&#252;hl
Dietmar K&#252;hl

Reputation: 153965

What properties do you expect to be in a node? I'd expect to see a variable set of properties. Any fixed set of node (or edge) attributes is likely to be insufficient for some algorithms. Which led me to develop what I called "data accessors" and what is called "property maps" in BGL. Using a node address to determine properties of a graph wouldn't perturb me in the slightest although I would expect in most cases that something like an index is used. Depending on whether the graph size is sort of known or the graph is created on the fly, using an std::unordered_map<void*, P> to determine values of properties may be more effective. Also, depending on the algorithm only portions of the graph may be visited.

Graphs can be represented in may different ways. An adjacency list as in the example is one example. Most of the time you're better of using incidence lists (i.e., you have explicit representations of edges which is, e.g., important if you have parallel edges with different properties). You may have an adjacency matrix if there are no parallel edges and the graph is nearly complete. Using a decent abstraction to run your algorithms you can even have implicitly represented graphs. For example, you may define a graph where a nodes with indices N and M are adjacent if N and M have a GCD or some value.

How the properties are represented mostly depends on the needs of the algorithm. The node type given looks as if it is intended to be used in a object-oriented set-up which is, however, unlikely to yield decent performance. Given that many interesting graph algorithms have a complexity which is non-linear, operating fast makes a difference.

Upvotes: 2

aleberguer
aleberguer

Reputation: 370

If it's a small graph, you can represent it with a Adjacency matrix. Each cell (i,j) represents that node is adjacent to j.

Upvotes: 0

KalyanS
KalyanS

Reputation: 527

You are right. Without any identifying information associated with each vertex(node) in a graph, you can't do much with the graph.

The interviewer might have either slipped it or he could justify it with the idea that he could maintain a map associating each node with an id or name. When the information is substantial or expensive to maintain, they would go this way. That said, a pointer to the data from each node is more common than a map based approach.

Upvotes: 0

Related Questions