Lior Goldemberg
Lior Goldemberg

Reputation: 876

neo4j cypher retrieve properties of related nodes to path's nodes

i have the following sample data,http://console.neo4j.org/?id=ktfn9n , and i have 2 questions:

1.about the following query: (tries to find all the sub-paths inside the Hits path of specific pages)

MATCH (step1:Hit)
WHERE step1.page =~ '(?i)(.*home.*)'
MATCH (step2:Hit)
WHERE step2.page =~ '(?i)(.*register.*)'
MATCH (step3:Hit)
WHERE step3.page =~ '(?i)(.*buy.*)'
MATCH path=step1-[:NEXT*]->step2-[:NEXT*]->step3
WITH filter(n IN NODES(path) 
        WHERE n:Hit AND n.page =~ '(?i)(.*home.*|.*register.*|.*buy.*)')   AS filtered
WITH extract(v IN filtered| { page:lower(v.page)}) AS ex UNWIND ex AS pages
WITH COLLECT(DISTINCT pages) AS hits
RETURN hits,count(hits) AS path_users_count
ORDER BY path_users_count DESC

as you can see in the result-set in the console: the result is:

[ {page:"home"}, {page:"register"}, {page:"buy"}] 1

what i was expecting is:

[ {page:"home"}, {page:"register"}, {page:"buy"}] 2

since there 2 paths with the flow of the 3 pages in the example (2 red lines in the attached image) enter image description here

2. second question

currently i'm including the page in each Hit object, what causing to waist of resources. i want to take in the final result, the page name from the Page which related to the Hit. (in the real database i have about 10 related nodes to each Hit, and i need to return 5 of them in the result object , so dont think that it can be included in the first MATCH right?)

Upvotes: 1

Views: 342

Answers (1)

Brian Underwood
Brian Underwood

Reputation: 10856

When you UNWIND and then collect(DISTINCT ...) you're unravelling your set of arrays into one flat list and the collecting it back into a single distinct list instead of one for each path match. If you include the path variable in your WITHs you'll continue to keep them grouped by path:

MATCH (step1:Hit)
WHERE step1.page =~ '(?i)(.*home.*)'
MATCH (step2:Hit)
WHERE step2.page =~ '(?i)(.*register.*)'
MATCH (step3:Hit)
WHERE step3.page =~ '(?i)(.*buy.*)'
MATCH path=step1-[:NEXT*]->step2-[:NEXT*]->step3
WITH path, filter(n IN NODES(path) 
        WHERE n:Hit AND n.page =~ '(?i)(.*home.*|.*register.*|.*buy.*)')   AS filtered
WITH path, extract(v IN filtered| {page:lower(v.page)}) AS ex
UNWIND ex AS pages
WITH path, COLLECT(DISTINCT pages) AS hits
RETURN hits,count(hits) AS path_users_count
ORDER BY path_users_count DESC

This returns 3, though I think that might be correct because that second path in the lower left of the graphic contains two paths which match your criteria.

I'm not sure I understand your second question. Do you mean that you want to have a label Page which has nodes which your Hit nodes reference to say what page they were hitting? If so I don't think that's a problem. I think you could change the beginning to this:

MATCH (step1:Hit)-[:HITS]->(page1:Page)
WHERE page1.url =~ '(?i)(.*home.*)'
MATCH (step2:Hit)-[:HITS]->(page2:Page)
WHERE page2.url =~ '(?i)(.*register.*)'
MATCH (step3:Hit)-[:HITS]->(page3:Page)
WHERE page3.url =~ '(?i)(.*buy.*)'

and the rest would be the same

Upvotes: 2

Related Questions