Kshitiz Sharma
Kshitiz Sharma

Reputation: 18627

Elegant way to implement a navigable graph?

This is a design problem. I'm struggling to create a conceptual model for a problem I'm facing.

I have a graph of a number of objects (<1000). These objects are connected together in a myriad of ways. Each of these objects have some attributes.

I need to be able to access these object via both their connections and their attributes.

For example let us assume following objects -

{name: A, attributes:{black, thin, invalid}, connections: {B,C}} 
{name: B, attributes:{white, thin, valid}, connections: {A}} 
{name: C, attributes:{black, thick, invalid}, connections: {A,B}}

Now I should be able to query this graph in following ways - Using attributes -

black - yields [A,C]
black.thick - yields C

Using connections -

A.connections[0].connections[0] - yields A

Using combination thereof -

black[0].connections[0] - yields B

My primary language is Java. But I don't think Java is capable of handling these kinds of beasts. Thus I'm trying to implement this in a dynamic language like Python. I have also thought about using expression language evaluation like OGNL, or a Graph database. But I'm confused. I'm not interested in coding solutions. But what is the correct way to model such a problem?

Upvotes: 3

Views: 442

Answers (4)

stephen mallette
stephen mallette

Reputation: 46226

I think that solving this problem with a graph makes sense. You mention the possibility of using a graph database which I think will allow you to better focus on your problem as opposed to coding infrastructure. A simple in-memory graph like TinkerGraph from the TinkerPop project would be a good place to start.

By using TinkerGraph you then get access to a query language called Gremlin (also see GremlinDocs)which can help answer the questions you posed in your post. Here's a Gremlin session in the REPL which show how to construct the graph you presented and some sample graph traversals that yield the answers you wanted...this first part simple constructs the graph given your example:

gremlin> g = new TinkerGraph()
==>tinkergraph[vertices:0 edges:0]
gremlin> a = g.addVertex("A",['color':'black','width':'thin','status':'invalid']) 
==>v[A]
gremlin> b = g.addVertex("B",['color':'white','width':'thin','status':'valid'])  
==>v[B]
gremlin> c = g.addVertex("C",['color':'black','width':'thick','status':'invalid'])
==>v[C]
gremlin> a.addEdge('connection',b)                                                
==>e[0][A-connection->B]
gremlin> a.addEdge('connection',c)                                                
==>e[1][A-connection->C]
gremlin> b.addEdge('connection',a)                                                
==>e[2][B-connection->A]
gremlin> c.addEdge('connection',a)                                                
==>e[3][C-connection->A]
gremlin> c.addEdge('connection',b)                                                
==>e[4][C-connection->B]

Now the queries:

// black - yields [A,C]
gremlin> g.V.has('color','black')
==>v[A]
==>v[C]

// black.thick - yields C
gremlin> g.V.has('width','thick')
==>v[C]

// A.connections[0].connections[0] - yields A
gremlin> a.out.out[0]
==>v[A]

// black[0].connections[0] - yields B
gremlin> g.V.has('color','black')[0].out[0]
==>v[B]

While this approach does introduce some learning curve if you are unfamiliar with the stack, I think you'll find that graphs fit as solutions to many problems and having some experience with the TinkerPop stack will be generally helpful for other scenarios you encounter.

Upvotes: 0

Dev Blanked
Dev Blanked

Reputation: 8865

It sounds like you have some object model which you want to query in different ways. One solution would be to use Java to create your model and then use a scripting language to support querying against this model in different ways. e.g: Java + Groovy would be my recommendation.

You could use the following Java class for the model.

public class Node {

    private String name;
    private final Set<String> attributes = new HashSet<String>();
    private final List<Node> connections = new ArrayList<Node>();

    // getter / setter for all
}

You should then populate a list of such objects with 'connections' property properly populated.

To support different kinds of scripting what you need to do is create a context for the scripts and then populated this context. Context is basically a map. The keys of the map become variables available to the script. The trick is to populate this context to support your querying requirements.

For example in groovy the binding is the context (refer http://groovy.codehaus.org/Embedding+Groovy). So if you populate it the following way your querying needs will be taken care of

Context/Binding Map

1. <Node name(String), Node object instance(Node)>
2. <Attribute name(String), list of nodes having this attribute(List<Node>)>

when you evaluate a script saying 'A.connections[0]', in the binding the object stored against key 'A' would be looked up. Then the returned objects 'connections' property will be accessed. Since that is a list the '[0]' syntax on that is permitted in groovy. This will return the object at index 0. Likewise to support your querying requirements you need to populate the context.

Upvotes: 1

Petr
Petr

Reputation: 63389

There is no problem expressing this in Java. Just define classes representing nodes sets of nodes. Assuming that there is a fixed set of attributes, it could look like:

enum Attribute {
    BLACK, WHITE, THIN, VALID /* etc. */ ;
}
class Node {
    public final String name;
    public final EnumSet<Attribute> attrs
        = EnumSet.noneOf(Attribute.class);
    public final NodeSet connections
        = new NodeSet();

    public Node(String name)
    {
        this.name = name;
    }

    // ... methods for adding attributes and connections
}

and then a class that represents a set of nodes:

class NodeSet extends LinkedHashSet<Node> {
    /**
     * Filters out nodes with at least one of the attributes.
     */
    public NodeSet with(Attribute... as) {
        NodeSet out = new NodeSet();
        for(Node n : this) {
            for(a : as)
                if (n.attrs.contains(a)) {
                    out.add(n);
                    break;
                }
        }
        return out;
    }

    /**
     * Returns all nodes connected to this set.
     */
    public NodeSet connections() {
        NodeSet out = new NodeSet();
        for(Node n : this)
            out.addAll(n.connections);
        return out;
    }

    /**
     * Returns the first node in the set.
     */
    public Node first() {
        return iterator().next();
    }
}

(I haven't checked that the code compiles, it's just a sketch.) Then, assuming you have a NodeSet all of all the nodes, you can do things like

all.with(BLACK).first().connections()

Upvotes: 0

Pat Lillis
Pat Lillis

Reputation: 1180

It depends where you want your performance to be.

If you want fast queries, and don't mind a bit of extra time/memory when adding an object, keeping an array/list of pointers to objects with specific attributes might be a good idea (particularly if you know during design-time what the possible attributes could be). Then, when adding a new object, say:

{name: A, attributes:{black, thin, invalid}, connections: {B,C}} 

add a new pointer to the black list, the thin list, and the invalid list. Quick queries on connections will probably require keeping a list/array of pointers as a member of the object class. Then when you create an object, add pointers for the correct objects.

If you don't mind slower queries and want to optimize performance when adding objects, a linked list might be a better approach. You can just loop through all of the objects, checking at each one if it satisfies the condition of the query.

In this case, it would still be a good idea to keep member pointers for the connections, if (as your question would seem to indicate) you're looking to do multiple-level queries (i.e. A.connections[0].connections[0]. This will result in extremely poor performance if done via nested loops.)

Hopefully that helps, it really kind of depends on what kind of queries you're expecting to call most frequently.

Upvotes: 0

Related Questions