Count relationship pairs in Neo4j

Question

I'm starting dealing with Neo4j so I'm not really proficient on this topic. In figure below I have 2-mode (bipartite) graph where green node presents "document" and red node presents "term" which occur in particular document. (Real-world graph is actually huge: about 20.000.000 documents and 25.000 terms).

I wonder how I go about counting co-occurrence pairs of terms in neo4j (in Cypher or Java). The desired output from query should be:

# Example: Pair (term-1, term-2) occurs in doc-1 and in doc-3
# Frequency for pair (term-1, term-2) should be 2
# termA | term B | frequency
term-1 | term-2 | 2
term-1 | term-3 | 1
term-2 | term-3 | 2

Bipartite graph

Graph is available at http://console.neo4j.org/r/7fmo7c

Code to reproduce test graph in Neo4j

set name root
mkrel -t ROOT -c -v
cd 1
set name doc-1
set type document
mkrel -t HAVE -cv
cd 2
set name term-1
set type term
cd ..
mkrel -t HAVE -cv
cd 3
set name term-2
set type term
cd ..
mkrel -t HAVE -cv
cd 4
set name term-3
set type term
mkrel -t HAVE -d INCOMING -c
cd 5
set name doc-2
set type document
mkrel -t HAVE -d OUTGOING -n 3
cd 3
mkrel -t HAVE -d INCOMING -c
cd 6
set name doc-3
set type document
mkrel -t HAVE -d OUTGOING -n 2

Code to reproduce test graph in Java

import org.neo4j.graphdb.DynamicRelationshipType;
import org.neo4j.graphdb.GraphDatabaseService;
import org.neo4j.graphdb.Node;
import org.neo4j.graphdb.Relationship;
import org.neo4j.graphdb.Transaction;
import org.neo4j.graphdb.factory.GraphDatabaseFactory;
import org.neo4j.graphdb.factory.GraphDatabaseSettings;

public class CountPairs {
    private static final String DB_PATH = "test.db";
    private static GraphDatabaseService graphDb;

    public static void main(String[] args) {
        graphDb = new GraphDatabaseFactory().
                newEmbeddedDatabaseBuilder(DB_PATH).
                setConfig(GraphDatabaseSettings.node_keys_indexable, "name, type").
                setConfig(GraphDatabaseSettings.node_auto_indexing, "true").
                newGraphDatabase();

        Transaction tx = graphDb.beginTx();
        Node doc1, doc2, doc3 = null;
        Node term1, term2, term3 = null;
        Relationship rel1, rel2, rel3, rel4, rel5, rel6, rel7 = null;
        try
        {
            // Create nodes
            doc1 = graphDb.createNode();
            doc2 = graphDb.createNode();
            doc3 = graphDb.createNode();
            term1 = graphDb.createNode();
            term2 = graphDb.createNode();
            term3 = graphDb.createNode();
            // Set properties
            doc1.setProperty("name", "doc1");
            doc1.setProperty("type", "document");
            doc2.setProperty("name", "doc2");
            doc2.setProperty("type", "document");
            doc3.setProperty("name", "doc3");
            doc3.setProperty("type", "document");
            // Set properties
            term1.setProperty("name", "term1");
            term1.setProperty("type", "term");
            term2.setProperty("name", "term2");
            term2.setProperty("type", "term");
            term3.setProperty("name", "term3");
            term3.setProperty("type", "term");
            // Create relations
            rel1 = doc1.createRelationshipTo(term1, DynamicRelationshipType.withName("HAVE"));
            rel2 = doc1.createRelationshipTo(term2, DynamicRelationshipType.withName("HAVE"));
            rel3 = doc1.createRelationshipTo(term3, DynamicRelationshipType.withName("HAVE"));
            rel4 = doc2.createRelationshipTo(term2, DynamicRelationshipType.withName("HAVE"));
            rel5 = doc2.createRelationshipTo(term3, DynamicRelationshipType.withName("HAVE"));
            rel6 = doc3.createRelationshipTo(term1, DynamicRelationshipType.withName("HAVE"));
            rel7 = doc3.createRelationshipTo(term2, DynamicRelationshipType.withName("HAVE"));

            tx.success();
        }
        catch(Exception e)
        {
            tx.failure();
        }
        finally
        {
            tx.finish();
        }

        graphDb.shutdown();
    }
}

Gopi · Accepted Answer

start t1=node(*), t2=node(*) 
where has(t1.type) and has(t2.type) and t1.type='term' and t2.type='term' and id(t1) < id(t2)
with t1, t2 
match t1<-[:HAVE]-doc-[:HAVE]->t2 
where doc.type='document' 
return t1, t2, count(doc)

You can try this here : http://console.neo4j.org/r/pshvqx

I hope this is what you want. Further, for better performance, I would suggest you to put index on nodes of type 'term' and use index in the start clause to get t1 and t2

Count relationship pairs in Neo4j

Answers (1)

Related Questions