Wipster
Wipster

Reputation: 1570

Smart way to generate edges in Neo4J for big graphs

I want to generate a graph from a csv file. The rows are the vertices and the columns the attributes. I want to generate the edges by similarity on the vertices (not necessarily with weights) in a way, that when two vertices have the same value of some attribute, an edge between those two will have the same attribute with value 1 or true.

The simplest cypher query that occurs to me looks somewhat like this:

Match (a:LABEL), (b:LABEL)
WHERE a.attr = b.attr
CREATE (a)-[r:SIMILAR {attr : 1}]->(b)

The graph has about 148000 vertices and the Java Heap Sizeoption is: dynamically calculated based on available system resources.

The query I posted gives a Neo.DatabaseError.General.UnknownFailure with a hint to Java Heap Space above.

A problem I could think of, is that a huge cartesian product is build first to then look for matches to create edges. Is there a smarter, maybe a consecutive way to do that?

Upvotes: 2

Views: 107

Answers (2)

stdob--
stdob--

Reputation: 29167

I think you need a little change model: no need to connect every node to each other by the value of a particular attribute. It is better to have a an intermediate node to which you will bind the nodes with the same value attribute.

This can be done at the export time or later.

For example:

Match (A:LABEL) Where A.attr Is Not Null
Merge (S:Similar {propName: 'attr', propValue: A.attr})
Merge (A)-[r:Similar]->(S)

Later with separate query you can remove similar node with only one connection (no other nodes with an equal value of this attribute):

Match (S:Similar)<-[r]-()
With S, count(r) As r Where r=1 
Detach Delete S

If you need connect by all props, you can use next query:

Match (A:LABEL) Where A.attr Is Not Null
With A, Keys(A) As keys
  Unwind keys as key
    Merge (S:Similar {propName: key, propValue: A[key]})
    Merge (A)-[:Similar]->(S)

Upvotes: 2

Christophe Willemsen
Christophe Willemsen

Reputation: 20185

You're right that a huuuge cartesian product will be produced.

You can iterate the a nodes in batches of 1000 for eg and run the query by incrementing the SKIP value on every iteration until it returns 0.

MATCH (a:Label)
WITH a LIMIT SKIP 0 LIMIT 1000
MATCH (b:Label)
WHERE b.attr = a.attr AND id(b) > id(a)
CREATE (a)-[:SIMILAR_TO {attr: 1}]->(b)
RETURN count(*) as c

Upvotes: 1

Related Questions