Reputation: 3371
I am working on CYPHER for different queries and my Match
query takes too much time.
Query:
MATCH (a:FacebookComment), (b:FacebookLike)
WHERE a.post_id > 696
AND
a.post_id < 746
AND
a.id = toint(b.comment_id)
CREATE (a) <-[fpl:FB_COMMENT_LIKE]-(b)
There are 20117 nodes in FacebookLike and 6054 nodes in FacebookComment.
It takes 8 and half minutes to execute.
What can be issue in query or at any other point?
EDIT
I run PROFILE in cypher query
PROFILE
Match (a:FacebookComment) WHERE a.post_id > 696 AND a.post_id < 746 with a
Match (b:FacebookLike) WHERE a.id = toint(b.comment_id) with b, a
MERGE (a)<-[fpl:FB_COMMENT_LIKE]-(b)
It took 808676 ms.
Result:
And Cypher query:
PROFILE
Match (a:FacebookComment), (b:FacebookLike) WHERE a.id = toint(b.comment_id) AND a.post_id > 696 AND a.post_id < 746
MERGE (a)<-[fpl:FB_COMMENT_LIKE]-(b)
It took 800793 ms.
RESULT:
Upvotes: 1
Views: 657
Reputation: 30417
You might not be able to escape the long execution times when doing these kind of queries to create your relationships, setting up of relationships when all you have are ids and foreign keys can take time. That said, there are some great opportunities for improvement here.
Especially problematic is the toInt() you have to do on the :FacebookLike comment_id...that means that you can't take advantage of an index here. You can see that in your PROFILE, the Filter operation after the cartesian product, which does 372 million db hits; in order to do that id comparison, it must iterate over every single :FacebookLike, changing the id to an int, THEN doing the comparison. This is the most expensive part of your query.
If the :FacebookLike comment_id was already an int (that is, if toInt() isn't needed, and you left it in by mistake), or if you take some time to change every :FacebookLike comment_id to an int ahead of time, you should see a significant improvement in that filter.
That is also assuming that you create indexes on both :FacebookComment.post_id and :FacebookLike.comment_id. (also, make sure you're working with the right fields...you have post_ids, comment_ids, and ids in your query, could be easy to mix things up).
As for other improvements (AFTER you've solved the big one above), breaking up your query into multiple parts will help you avoid the cartesian product.
The full query might look like this:
MATCH (a:FacebookComment)
WHERE a.post_id > 696
AND a.post_id < 746
WITH a
MATCH (b:FacebookLike)
WHERE b.comment_id = a.id
WITH a, b
CREATE (a)<-[fpl:FB_COMMENT_LIKE]-(b)
Upvotes: 2