ron
ron

Reputation: 1

my neo4j cypher is very slow. how to improve?

there are 15M nodes and 150M relations in the db, i run the following cypher and it takes more than 200 secondes to get the result. machine cpu&memory is low. what should i do to improve? I'd appreciate some advise.

cypher:

START a=node:node_auto_index(userId='32887522') 
MATCH a -[:RELATIONSHIP_TYPE_FRIEND]- b -[:RELATIONSHIP_TYPE_FRIEND]- c 
WHERE NOT(a -[:RELATIONSHIP_TYPE_FRIEND]- c) AND NOT(a=c) 
RETURN c.userId as userId, COUNT(b) AS commonFriends 
ORDER BY commonFriends DESC 
LIMIT 100;

excute plan:

ColumnFilter(symKeys=["userId", "  INTERNAL_AGGREGATE24121597-f14e-4ddf-b29d-0e3397500829"], returnItemNames=["userId", "commonFriends"], _rows=100, _db_hits=0)

==> Top(orderBy=["SortItem(Cached(  INTERNAL_AGGREGATE24121597-f14e-4ddf-b29d-0e3397500829 of type Long),false)"], limit="Literal", _rows=100, _db_hits=0)

==>   EagerAggregation(keys=["Cached(userId of type Any)"], aggregates=["(  INTERNAL_AGGREGATE24121597-f14e-4ddf-b29d-0e3397500829,Count)"], _rows=3656, _db_hits=0)

==>     Extract(symKeys=["  UNNAMED60", "a", "b", "  UNNAMED92", "c"], exprKeys=["userId"], _rows=15416, _db_hits=15416)

==>       Filter(pred="(NOT(nonEmpty(a-[  UNNAMED137:RELATIONSHIP_TYPE_FRIEND]-c)) AND NOT(a == c))", _rows=15416, _db_hits=0)

==>         TraversalMatcher(trail="(a)-[  UNNAMED60:RELATIONSHIP_TYPE_FRIEND WHERE true AND true]-(b)-[  UNNAMED92:RELATIONSHIP_TYPE_FRIEND WHERE true AND true]-(c)", _rows=15470, _db_hits=15547)

==>           ParameterPipe(_rows=1, _db_hits=0)

Upvotes: 0

Views: 53

Answers (2)

Max De Marzi
Max De Marzi

Reputation: 1108

Try a query that looks more like this on Neo4j 2.2:

START me=node:node_auto_index(userId='32887522')
MATCH (me)-[:RELATIONSHIP_TYPE_FRIEND]-(people)
WITH me, COLLECT(people) as friends
MATCH (me)-[:RELATIONSHIP_TYPE_FRIEND]-(people)-[:RELATIONSHIP_TYPE_FRIEND]-(fof)
WHERE me <> fof 
WITH me, fof, COUNT(*) AS freq, friends
WHERE NOT (fof IN friends)
WITH fof, freq
RETURN fof.userId, freq
ORDER BY freq DESC 
LIMIT 10

It's a ton closer to the optimal java way of doing it => http://maxdemarzi.com/2014/04/24/translating-cypher-to-neo4j-java-api-2-0/

Upvotes: 1

ron
ron

Reputation: 1

i simplify the cypher to:

START me=node:node_auto_index(userId='32887522')

MATCH (me)-[:RELATIONSHIP_TYPE_FRIEND]-(people)-[:RELATIONSHIP_TYPE_FRIEND]-(fof)

RETURN fof,count(*) AS commonFriends

ORDER BY commonFriends DESC

LIMIT 100;

and get the excute plan:

ColumnFilter(symKeys=["fof", " INTERNAL_AGGREGATEeaff758c-8eda-498a-9366-9965f62d16fc"], returnItemNames=["fof", "commonFriends"], _rows=100, _db_hits=0)

==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATEeaff758c-8eda-498a-9366-9965f62d16fc of type Long),false)"], limit="Literal", _rows=100, _db_hits=0)

==> EagerAggregation(keys=["fof"], aggregates=["( INTERNAL_AGGREGATEeaff758c-8eda-498a-9366-9965f62d16fc,CountStar)"], _rows=3661, _db_hits=0)

==> TraversalMatcher(trail="(me)-[ UNNAMED62:RELATIONSHIP_TYPE_FRIEND WHERE true AND true]-(people)-[ UNNAMED99:RELATIONSHIP_TYPE_FRIEND WHERE true AND true]-(fof)", _rows=15470, _db_hits=15547)

==> ParameterPipe(_rows=1, _db_hits=0)

it is still very slow. don't know why?

Upvotes: 0

Related Questions