ugurtosun
ugurtosun

Reputation: 327

Neo4j GraphSage training does not log anything

I am working on extracting graph embeddings with training GraphSage algorihm. I am working on a large graph consisting of (82,339,589) nodes and (219,521,164) edges. When I checked with ":queries" command the query is listed as running. Algorithm started in 6 days ago. When I look the logs with "docker logs xxx" the last logs listed as

2021-12-01 12:03:16.267+0000 INFO Relationship Store Scan (RelationshipScanCursorBasedScanner): Imported 352,492,468 records and 0 properties from 16247 MiB (17,036,668,320 bytes); took 59.057 s, 5,968,663.57 Relationships/s, 275 MiB/s (288,477,487 bytes/s) (per thread: 1,492,165.89 Relationships/s, 68 MiB/s (72,119,371 bytes/s))

2021-12-01 12:03:16.269+0000 INFO [neo4j.BoltWorker-3 [bolt] [/10.0.0.6:56143] ] LOADING

INFO [neo4j.BoltWorker-3 [bolt] [/10.0.0.6:56143] ] LOADING Actual memory usage of the loaded graph: 8602 MiB

INFO [neo4j.BoltWorker-3 [bolt] [/10.0.0.6:64076] ] GraphSageTrain :: Start

There is a way to see detailed logs about training process. Is it normal for taking 6 days for graphs with shared sizes ?

Upvotes: 1

Views: 101

Answers (1)

Nathan Smith
Nathan Smith

Reputation: 881

It is normal for GraphSAGE to take a long time compared to FastRP or Node2Vec. Starting in GDS 1.7, you can use

CALL gds.beta.listProgress(jobId: String)
YIELD
  jobId,
  taskName,
  progress,
  progressBar,
  status,
  timeStarted,
  elapsedTime

If you call without passing in a jobId, it will return a list of all running jobs. If you call with a jobId, it will give you details about a running job.

This query will summarize the details for job 03d90ed8-feba-4959-8cd2-cbd691d1da6c.

CALL gds.beta.listProgress("03d90ed8-feba-4959-8cd2-cbd691d1da6c") 
YIELD taskName, status 
RETURN taskName, status, count(*)

Here's the documentation for progress logging. The system monitoring procedures might also be helpful to you.

Upvotes: 1

Related Questions