Reputation: 11
I'm currently working on a product recommendation query that should return products to recommend to the current customer based on finding similar customers based on viewing similar products and then suggesting other products those similar customers have viewed. Our business is in consignment so we only have 1 of every product so I'm working with a larger dataset in similar views vs just purchases. It is my expectation that this query should be able to run in well under a second given that it is only being run against a little over 10k products and 10k users currently in our development environment. I'm unsure if it is my query that needs tuning, the linux/java/neo4j config or both. Does anyone have any experience in this?
MATCH (currentUser:websiteUser{uuid: 'ea1d35e7-73e6-4990-b7b5-
2db24121da9b'})-[:VIEWED]->(i:websiteItem)<-[:VIEWED]-
(similarUser:websiteUser)-[r:VIEWED]->(similarItem:websiteItem
{active: true})
RETURN similarItem.designer, similarItem.title,
similarItem.brandSize, similarItem.sku, similarItem.shopifyProductId,
similarItem.url, similarItem.price, COUNT(distinct r) AS score
ORDER BY score DESC LIMIT 15
Profile output:
Upvotes: 1
Views: 72
Reputation: 11
After doing some further research, and trying many different suggestions from other posts across the internet related to both performance tuning of the machine and the query I've found the following query rewrite to provide the best speed based on rewriting the query in steps and using with distinct to limit the expansion ballooning of results from one segment of the query to the next.
MATCH (u:websiteUser{uuid: 'ea1d35e7-73e6-4990-b7b5-2db24121da9b'})
MATCH (u)-[:VIEWED]->(i:websiteItem)
WITH distinct i
MATCH (i)<-[:VIEWED]-(su:websiteUser)
WITH distinct su
MATCH (su)-[r:VIEWED]->(si:websiteItem {active: true})
RETURN si.designer, si.title, si.brandSize, si.sku, si.shopifyProductId,
si.url, si.price, COUNT(distinct r) AS score
ORDER BY score DESC
LIMIT 15
Upvotes: 0