iam.Carrot
iam.Carrot

Reputation: 5276

optimize cypher query fired from python

driverforneo4jforpython. I have a program that dynamically creates around 10-12 queries . The final result from all queries is collected in alist` and returned.

Below are 10 such queries:

MATCH  (sslc:subSubLocality)-[:CHILD_OF]->(v4)-[:CHILD_OF]->(v3)-[:CHILD_OF]->(v2)-[:CHILD_OF]->(st:state)  WHERE (st.name_wr = 'abcState') AND (sslc.name_wr= 'xyzSLC' OR sslc.name_wr= 'abcxyzcolony')  RETURN st, sslc, v4, v3, v2

MATCH  (slc:subLocality)-[:CHILD_OF]->(v3)-[:CHILD_OF]->(v2)-[:CHILD_OF]->(st:state)  WHERE (st.name_wr = 'abcState') AND (slc.name_wr= 'xyzSLC' OR slc.name_wr= 'abcxyzcolony')  RETURN st, slc, v3, v2

MATCH  (loc:locality)-[:CHILD_OF]->(v2)-[:CHILD_OF]->(st:state)  WHERE (st.name_wr = 'abcState') AND (loc.name_wr= 'deltax' OR loc.name_wr= 'xyzSLC' OR loc.name_wr= 'abcxyzcolony')  RETURN st, loc, v2

MATCH  (ct:city)-[:CHILD_OF]->(st:state)  WHERE (st.name_wr = 'abcState') AND (ct.name_wr= 'deltax' OR ct.name_wr= 'abcxyz')  RETURN st, ct

MATCH  (sslc:subSubLocality)-[:CHILD_OF]->(v3)-[:CHILD_OF]->(v2)-[:CHILD_OF]->(ct:city)  WHERE (ct.name_wr = 'deltax' OR ct.name_wr = 'abcxyz') AND (sslc.name_wr= 'xyzSLC' OR sslc.name_wr= 'abcxyzcolony')  RETURN ct, sslc, v3, v2

MATCH  (slc:subLocality)-[:CHILD_OF]->(v2)-[:CHILD_OF]->(ct:city)  WHERE (ct.name_wr = 'deltax' OR ct.name_wr = 'abcxyz') AND (slc.name_wr= 'xyzSLC' OR slc.name_wr= 'abcxyzcolony')  RETURN ct, slc, v2

MATCH  (loc:locality)-[:CHILD_OF]->(ct:city)  WHERE (ct.name_wr = 'deltax' OR ct.name_wr = 'abcxyz') AND (loc.name_wr= 'deltax' OR loc.name_wr= 'xyzSLC' OR loc.name_wr= 'abcxyzcolony')  RETURN ct, loc

MATCH  (sslc:subSubLocality)-[:CHILD_OF]->(v2)-[:CHILD_OF]->(loc:locality)  WHERE (loc.name_wr = 'deltax' OR loc.name_wr = 'xyzSLC' OR loc.name_wr = 'abcxyzcolony') AND (sslc.name_wr= 'xyzSLC' OR sslc.name_wr= 'abcxyzcolony')  RETURN loc, sslc, v2

MATCH  (slc:subLocality)-[:CHILD_OF]->(loc:locality)  WHERE (loc.name_wr = 'deltax' OR loc.name_wr = 'xyzSLC' OR loc.name_wr = 'abcxyzcolony') AND (slc.name_wr= 'xyzSLC' OR slc.name_wr= 'abcxyzcolony')  RETURN loc, slc

MATCH  (sslc:subSubLocality)-[:CHILD_OF]->(slc:subLocality)  WHERE (slc.name_wr = 'xyzSLC' OR slc.name_wr = 'abcxyzcolony') AND (sslc.name_wr= 'xyzSLC' OR sslc.name_wr= 'abcxyzcolony')  RETURN slc, sslc

The Queries might change based on the input dictionary (as I mentioned the queries are created at run-time). But the queries share the same structure.

Below is a Query Plan that I get and it remains the same for all queries just differs in values inside.

query 1

Below is my code that fires up these requests:

def get_query_response(query_list: list)-> list:
    driver = GraphDatabase.driver(uri, auth=("neo4j", "neo4j"))
    with driver.session() as session:
        with session.begin_transaction() as tx:
            response = [record.values() for query in query_list for record in tx.run(query)]
            return response

The query_list is a collection of str that has these queries.

The problem is the whole task takes 2 seconds to give a response. Is there any way to optimize the query or make it faster or maybe operate in milliseconds?


Edit:

To answer a few questions:

Upvotes: 0

Views: 168

Answers (2)

cybersam
cybersam

Reputation: 66967

Add appropriate indexes or uniqueness constraints so that your generated queries do not need to scan for the appropriate nodes to start working.

For example (based on your examples), you could add indexes to:

  • :subSubLocality(name_wr)
  • :subLocality(name_wr)
  • :locality(name_wr)
  • :city(name_wr)

Upvotes: 1

Stephen Gheysens
Stephen Gheysens

Reputation: 303

I can't say for sure what the cause is, but I have a few questions that should help us get closer to an answer.

• Have you tried benchmarking these queries individually? At first glance, they look like they are simple enough to complete, so I don't think this is the issue but it wouldn't hurt to know if you really need to be optimizing the queries themselves.

• You mentioned it takes "2 seconds", is that from the moment you hit 'enter' to execute your Python script (so things like initiating the connection to the Neo4j instance are included), or does it specifically take 2.0 seconds for the queries to execute?

• The docs note that prior to v3.2 of Neo4j, the Cypher planner wasn't always making the most efficient choices. If you have an earlier version, the docs mention you should default to the cost-based planner.

• Is this a local Neo4j instance? If it's hosted, what are the hardware specs of the host machine? Might not hurt to bump up the specs if possible.

• If you haven't added any custom indexing on properties and your queries always look the same, I would recommend looking into that option.

Upvotes: 0

Related Questions