Reputation: 876
i have the following sample data,http://console.neo4j.org/?id=ktfn9n , and i have 2 questions:
1.about the following query: (tries to find all the sub-paths inside the Hits path of specific pages)
MATCH (step1:Hit)
WHERE step1.page =~ '(?i)(.*home.*)'
MATCH (step2:Hit)
WHERE step2.page =~ '(?i)(.*register.*)'
MATCH (step3:Hit)
WHERE step3.page =~ '(?i)(.*buy.*)'
MATCH path=step1-[:NEXT*]->step2-[:NEXT*]->step3
WITH filter(n IN NODES(path)
WHERE n:Hit AND n.page =~ '(?i)(.*home.*|.*register.*|.*buy.*)') AS filtered
WITH extract(v IN filtered| { page:lower(v.page)}) AS ex UNWIND ex AS pages
WITH COLLECT(DISTINCT pages) AS hits
RETURN hits,count(hits) AS path_users_count
ORDER BY path_users_count DESC
as you can see in the result-set in the console: the result is:
[ {page:"home"}, {page:"register"}, {page:"buy"}] 1
what i was expecting is:
[ {page:"home"}, {page:"register"}, {page:"buy"}] 2
since there 2 paths with the flow of the 3 pages in the example (2 red lines in the attached image)
2. second question
currently i'm including the page in each Hit object, what causing to waist of resources. i want to take in the final result, the page name from the Page which related to the Hit. (in the real database i have about 10 related nodes to each Hit, and i need to return 5 of them in the result object , so dont think that it can be included in the first MATCH right?)
Upvotes: 1
Views: 342
Reputation: 10856
When you UNWIND
and then collect(DISTINCT ...)
you're unravelling your set of arrays into one flat list and the collect
ing it back into a single distinct list instead of one for each path match. If you include the path
variable in your WITH
s you'll continue to keep them grouped by path:
MATCH (step1:Hit)
WHERE step1.page =~ '(?i)(.*home.*)'
MATCH (step2:Hit)
WHERE step2.page =~ '(?i)(.*register.*)'
MATCH (step3:Hit)
WHERE step3.page =~ '(?i)(.*buy.*)'
MATCH path=step1-[:NEXT*]->step2-[:NEXT*]->step3
WITH path, filter(n IN NODES(path)
WHERE n:Hit AND n.page =~ '(?i)(.*home.*|.*register.*|.*buy.*)') AS filtered
WITH path, extract(v IN filtered| {page:lower(v.page)}) AS ex
UNWIND ex AS pages
WITH path, COLLECT(DISTINCT pages) AS hits
RETURN hits,count(hits) AS path_users_count
ORDER BY path_users_count DESC
This returns 3, though I think that might be correct because that second path in the lower left of the graphic contains two paths which match your criteria.
I'm not sure I understand your second question. Do you mean that you want to have a label Page
which has nodes which your Hit
nodes reference to say what page they were hitting? If so I don't think that's a problem. I think you could change the beginning to this:
MATCH (step1:Hit)-[:HITS]->(page1:Page)
WHERE page1.url =~ '(?i)(.*home.*)'
MATCH (step2:Hit)-[:HITS]->(page2:Page)
WHERE page2.url =~ '(?i)(.*register.*)'
MATCH (step3:Hit)-[:HITS]->(page3:Page)
WHERE page3.url =~ '(?i)(.*buy.*)'
and the rest would be the same
Upvotes: 2