user2868952
user2868952

Reputation: 1

Query in Neo4j slow

I am new to using Neo4j and have setup a test graph db in neo4j for organizing some click stream data with a very small subset of what we actually use on a day to day basis. This graph has about 23 million nodes and 34 million relationships. The queries seem to be taking forever to run i.e. I haven't seen the response come back even after waiting for more than 30 mins.

The data is organized as Year->Month->Day->Session{1..n}->Event{1..n}

I am running the db on a Windows 7 machine with 1.5 gb of heap allocated to Neo4j server

These are the configurations in the neo4j-wrapper.conf

wrapper.java.additional.1=-Dorg.neo4j.server.properties=conf/neo4j-server.properties
wrapper.java.additional.2=-Djava.util.logging.config.file=conf/logging.properties
wrapper.java.additional.3=-Dlog4j.configuration=file:conf/log4j.properties

wrapper.java.additional.6=-XX:+UseParNewGC

wrapper.java.additional.7=-XX:+UseConcMarkSweepGC

wrapper.java.additional.8=-Xloggc:data/log/neo4j-gc.log

wrapper.java.initmemory=1500

wrapper.java.maxmemory=1500

This is what my query looks like

START n=node(3)
MATCH (n)-[:HAS]->(s)
WITH distinct s
MATCH (s)-[:HAS]->(e) WHERE e.page_name = 'Login'
WITH s.session_id as session, e
MATCH (e)-[:FOLLOWEDBY*0..1]->(e1) 
WITH count(session) as session_cnt, e.page_name as startPage, e1.page_name as nextPage
RETURN startPage, nextPage, session_cnt

Also i have these properties set

node_auto_indexing=true
node_keys_indexable=name,page_name,geo_country
relationship_auto_indexing=true

Can anyone help me to figure out what might be wrong.

Even when I run portions of the query it takes 10-15 minutes before I can see a response.

Note: I have no other applications running on the Windows Machine

Upvotes: 0

Views: 263

Answers (1)

Roy Awill
Roy Awill

Reputation: 192

Why would you want to return all the nodes in the first place?

If you really want to do that, use the transactional http endpoint and curl to stream the response:

I tested it with a database of 100k nodes. It takes 0.9 seconds to transfer them (1.5MB) over the wire. If you transfer all their properties by using "return n", it takes 1.4 seconds and results in 4.1MB transferred.

If you just want to know how many nodes are in your db. use something like this instead:

match (n) return count(*);

Upvotes: 0

Related Questions