Reputation: 3251
My problem is code that seems to be doing something and then stops w/no messaging in the message.log file or the browser.
Update: What seems to be happening is the server is completing its work, but the browser's not getting the notification and reporting the result.
I'm running the following code:
USING PERIODIC COMMIT 500
LOAD CSV WITH HEADERS FROM "file:/D:/OpenData/ProKB/tmp/KbText.csv" as line
CREATE (kt:KbText {kbid: line.kbid, seq: line.seq,kbtext: line.kbtext})
with kt
match (kb:KBase {kbid: kt.kbid})
with split(tolower(kt.kbtext), " ") as words, kb, kt
with [w in words WHERE NOT w in ["the", "and", "i", "to", "or", "", "Knowledge", "Article"]] as txt, kb, kt
foreach (wd in txt |
merge (v:Vocabulary {word: wd})
merge (kt)-[:WORD]->(v)
merge (v)-[:KB]->(kb)
)
with txt, kb, kt
unwind range(0, size(txt)-2) as wordnum
merge (kbs1:KbSentence {kbid: kb.kbid, word: txt[wordnum], seq: wordnum})
merge (kbs2:KbSentence {kbid: kb.kbid, word: txt[wordnum+1], seq: wordnum+1})
merge (kbs1)-[:NEXT]->(kbs2)
merge (kbs1)-[:TEXT]->(kt)
merge (kbs2)-[:TEXT]->(kt)
merge (kt)-[:WORDSEQ]->(kbs1)
merge (kt)-[:WORDSEQ]->(kbs2)
:schema is:
Indexes ON :ErrLink(kbid) ONLINE ON :ErrLink(errnum) ONLINE ON :KBase(kbid) ONLINE ON :KBase(groupcode) ONLINE ON :KbGroup(groupcode) ONLINE ON :KbGroup(kbgroup) ONLINE ON :KbLink(kbid) ONLINE ON :KbLong(kbid) ONLINE ON :KbSentence(kbid) ONLINE ON :KbSentence(seq) ONLINE ON :KbSentence(word) ONLINE ON :KbText(kbid) ONLINE ON :KbTextWord(kbid) ONLINE ON :KbTextWord(word) ONLINE ON :KbTxtWord(kbid) ONLINE ON :ProError(errnum) ONLINE ON :Vocabulary(word) ONLINE No constraints
and for a while I'm seeing activity in a transaction log, and then it stops. At the time that happens, the following messages appears in the log file:
2016-03-17 12:32:03.234+0000 INFO [o.n.k.i.a.i.s.OnlineIndexSamplingJob] Sampled index :KbText(kbid) with 36998 unique values in sample of avg size 113992 taken from index containing 113992 entries 2016-03-17 12:32:23.244+0000 INFO [o.n.k.i.a.i.s.OnlineIndexSamplingJob] Sampled index :KbText(kbid) with 36998 unique values in sample of avg size 123992 taken from index containing 123992 entries 2016-03-17 12:32:43.349+0000 INFO [o.n.k.i.a.i.s.OnlineIndexSamplingJob] Sampled index :KbText(kbid) with 36998 unique values in sample of avg size 132992 taken from index containing 132992 entries 2016-03-17 12:33:03.247+0000 INFO [o.n.k.i.a.i.s.OnlineIndexSamplingJob] Sampled index :KbText(kbid) with 36998 unique values in sample of avg size 143992 taken from index containing 143992 entries 2016-03-17 12:36:13.308+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]: Starting check pointing... 2016-03-17 12:36:13.308+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]: Starting store flush... 2016-03-17 12:36:13.721+0000 INFO [o.n.k.i.s.c.CountsTracker] About to rotate counts store at transaction 18762 to [D:\OpenData\ProKB\Neo4j\neostore.counts.db.a], from [D:\OpenData\ProKB\Neo4j\neostore.counts.db.b]. 2016-03-17 12:36:13.743+0000 INFO [o.n.k.i.s.c.CountsTracker] Successfully rotated counts store at transaction 18762 to [D:\OpenData\ProKB\Neo4j\neostore.counts.db.a], from [D:\OpenData\ProKB\Neo4j\neostore.counts.db.b]. 2016-03-17 12:36:13.915+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]: Store flush completed 2016-03-17 12:36:13.915+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]: Starting appending check point entry into the tx log... 2016-03-17 12:36:13.988+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]: Appending check point entry into the tx log completed 2016-03-17 12:36:13.988+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by scheduler for time threshold [18762]: Check pointing completed 2016-03-17 12:36:13.988+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [41]: Starting log pruning. 2016-03-17 12:36:13.988+0000 INFO [o.n.k.i.t.l.p.LogPruningImpl] Log Rotation [41]: Log pruning complete.
The browser is showing the spinning circle of dots, and if I open another browser to the database the new labels / relationships this code is supposed to create - aren't there.
How can I track what's going on during the load so I can see if it's doing anything and adjust the code if needed?
Upvotes: 0
Views: 170
Reputation: 39915
Your LOAD CSV
command shows up an eager pipe
in the query plan, verify this using:
explain LOAD CSV WITH HEADERS FROM "file:/D:/OpenData/ProKB/tmp/KbText.csv" as line
CREATE (kt:KbText {kbid: line.kbid, seq: line.seq,kbtext: line.kbtext})
with kt
match (kb:KBase {kbid: kt.kbid})
with split(tolower(kt.kbtext), " ") as words, kb, kt
with [w in words WHERE NOT w in ["the", "and", "i", "to", "or", "", "Knowledge", "Article"]] as txt, kb, kt
foreach (wd in txt |
merge (v:Vocabulary {word: wd})
merge (kt)-[:WORD]->(v)
merge (v)-[:KB]->(kb)
)
with txt, kb, kt
unwind range(0, size(txt)-2) as wordnum
merge (kbs1:KbSentence {kbid: kb.kbid, word: txt[wordnum], seq: wordnum})
merge (kbs2:KbSentence {kbid: kb.kbid, word: txt[wordnum+1], seq: wordnum+1})
merge (kbs1)-[:NEXT]->(kbs2)
merge (kbs1)-[:TEXT]->(kt)
merge (kbs2)-[:TEXT]->(kt)
merge (kt)-[:WORDSEQ]->(kbs1)
merge (kt)-[:WORDSEQ]->(kbs2)
The eager pipe prevents doing a periodic commit, which means the full file is processed in one single large transaction. Since transaction need to be built up in memory first before flush out to disc upon commit
you need to have enough memory. In most cases you don't and therefore the JVM might lock up and get stuck in doing garbage collections.
There's couple of blog posts out there on this topic, e.g. http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/.
The workaround is to split up the statement into smaller chunks that do not show eager
and run each of them separately - which of course means you're iterating the csv file multiple times.
Upvotes: 1