Reputation: 1311
If I have a tree modelled in Neo4J, and I know all the indexed node properties of a possible subtree, how can I quickly do a match to confirm if that subtree exists?
For an easy to understand example, if my tree is a representation of all the directories and files on a disk, and what I want to confirm is the existence of a file, here's what I'm doing now to test if /a/b/c/d/e/f/g/h/i/j/k/file.txt exists in a graph where my filesystem nodes have label mylabel
and have an indexed basename property:
MATCH (root:`mylabel` { basename: '/' })-[:contains]->(`1`:`mylabel` { basename: 'a' })-[:contains]->(`2`:`mylabel` { basename: 'b' })-[:contains]->(`3`:`mylabel` { basename: 'c' })-[:contains]->(`4`:`mylabel` { basename: 'd' })-[:contains]->(`5`:`mylabel` { basename: 'e' })-[:contains]->(`6`:`mylabel` { basename: 'f' })-[:contains]->(`7`:`mylabel` { basename: 'g' })-[:contains]->(`8`:`mylabel ` { basename: 'h' })-[:contains]->(`9`:`mylabel ` { basename: 'i' })-[:contains]->(`10`:`mylabel` { basename: 'j' })-[:contains]->(`11`:`mylabel` { basename: 'k' })-[:contains]->(leaf:`vdp`:`mylabel` { basename: 'file.txt' }) RETURN leaf
I also happen to have an absolute path property on the leaf node, but I can't just search for that because I want to use a similar query to do a merge if I'm adding a file, and in case the file has moved (by deleting the relationship of leaf to k, and adding a new relationship of leaf to some other directory node) without the path property getting updated. Also, I can have the same absolute path multiple times for different files, because it can be for different disks (this is modelled by the root node having a basename that includes the disk name, but that isn't seen in the leaf nodes path property).
The above query takes 2-7 seconds to run (uncached). Is there any way to do something like this quicker? When I specify every node in the path I hoped Neo4J could cope even with paths of length 12+.
Upvotes: 3
Views: 554
Reputation: 39915
Some ideas here:
Root
and make sure it's the only node carrying that labelmylabel
label, so performance wise it's better to skip the check (since it's implicitly given).
MATCH (r:Root)
WITH r
MATCH (r)-[:contains]->(a { basename: 'a' })
-[:contains]->( { basename: 'b' })
-[:contains]->( { basename: 'c' })
-[:contains]->( { basename: 'd' })
-[:contains]->( { basename: 'e' })
-[:contains]->( { basename: 'f' })
-[:contains]->( { basename: 'g' })
-[:contains]->( { basename: 'h' })
-[:contains]->( { basename: 'i' })
-[:contains]->( { basename: 'j' })
-[:contains]->( { basename: 'k' })
-[:contains]->(leaf:`vdp`:`mylabel` { basename: 'file.txt' })
RETURN leaf
In case you don't know the depth of your leaf beforehand you can use variable path length matchers:
match (r:Root)
with r
match p=(r)-[:contains*]->()
with p,[x in tail(nodes(p))|x.basename] as names
where names= ["a","b","c","d","e","f","g","h","i","j","k","file.txt"]
return p, names
However this statement will fully traverse your while directory tree and does not an early skip approach. You can do that by using e.g. in Java using traversal API.
Upvotes: 2