Reputation: 2116
I'm trying to preload a trig file which is around 24GB in terms of size. Below is the command I am using.
docker run -v $(pwd)/graphdb-data:/opt/graphdb/home \
-v $(pwd)/preload:/opt/graphdb-import \
--entrypoint /opt/graphdb/dist/bin/importrdf \
-e GDB_JAVA_OPTS="-Xmx12g -Xms2g" \
-e graphdb.page.cache.size=512m \
-e graphdb.workers.limit=2 \
-e graphdb.query.evaluation.mode=disk \
-e graphdb.repository.index.enable=false \
-e graphdb.compression.enabled=true \
-e graphdb.use.native.jena.model=true -e graphdb.verify-literals=false \
ontotext/graphdb:10.7.1 \
preload -s --force --recursive -q /tmp -c /opt/graphdb-import/graphdb-repo.ttl /opt/graphdb-import/backup.trig
Now, it seems to ingest the data however at the very end I get the following error:
12:05:31.663 [resolver] INFO c.ontotext.graphdb.importrdf.Preload - 310,000,000 statements ...
12:06:30.853 [resolver] INFO c.ontotext.graphdb.importrdf.Preload - 320,000,000 statements ...
12:06:50.697 [monitor file position] INFO c.ontotext.graphdb.importrdf.Preload - File backup.trig processed to position 23,774,363,648 from 26,664,959,488 bytes
12:07:54.327 [resolver] INFO c.ontotext.graphdb.importrdf.Preload - 330,000,000 statements ...
12:08:50.702 [monitor file position] INFO c.ontotext.graphdb.importrdf.Preload - File backup.trig processed to position 25,267,535,872 from 26,664,959,488 bytes
12:09:28.692 [resolver] INFO c.ontotext.graphdb.importrdf.Preload - 340,000,000 statements ...
java.lang.NumberFormatException: empty String
at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842)
at java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.base/java.lang.Double.parseDouble(Double.java:543)
at org.eclipse.rdf4j.model.base.AbstractLiteral$NumberLiteral.parseDouble(AbstractLiteral.java:367)
at java.base/java.util.Optional.map(Optional.java:265)
at org.eclipse.rdf4j.model.base.AbstractLiteral.value(AbstractLiteral.java:100)
at org.eclipse.rdf4j.model.base.AbstractLiteral.doubleValue(AbstractLiteral.java:141)
at com.ontotext.graphdb.importrdf.Preload.processLiteral(Preload.java:2124)
at com.ontotext.graphdb.importrdf.Resolver.write(Resolver.java:61)
at com.ontotext.graphdb.importrdf.Resolver.createid(Resolver.java:124)
at com.ontotext.graphdb.importrdf.Resolver.run(Resolver.java:203)
java.lang.InterruptedException
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2090)
at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433)
at com.ontotext.graphdb.importrdf.Preload$LocalHandler.handleStatement(Preload.java:434)
at org.eclipse.rdf4j.rio.trig.TriGParser.reportStatement(TriGParser.java:248)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseObject(TurtleParser.java:453)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseObjectList(TurtleParser.java:374)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parsePredicateObjectList(TurtleParser.java:347)
at org.eclipse.rdf4j.rio.trig.TriGParser.parseTriples(TriGParser.java:236)
at org.eclipse.rdf4j.rio.trig.TriGParser.parseGraph(TriGParser.java:163)
at org.eclipse.rdf4j.rio.trig.TriGParser.parseStatement(TriGParser.java:115)
at org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:164)
at org.eclipse.rdf4j.repository.util.RDFLoader.loadInputStreamOrReader(RDFLoader.java:304)
at org.eclipse.rdf4j.repository.util.RDFLoader.load(RDFLoader.java:249)
at com.ontotext.load.GraphdbRDFLoader.load(GraphdbRDFLoader.java:89)
at com.ontotext.graphdb.importrdf.Preload.processSingleFileInternal(Preload.java:2103)
at com.ontotext.graphdb.importrdf.Preload.lambda$processSingleFile$22(Preload.java:2053)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
12:10:32.001 [sorting] INFO c.ontotext.graphdb.importrdf.Preload - Sorter thread finished.
This file is a direct backup of a dev instance. I'm looking for a way to ignore the errors/inference of types. Is it possible to ignore such errors or simply ingest the correct data only with preload
command within graphdb?
Upvotes: 0
Views: 60
Reputation: 1140
You can try to load the data removing the -s option and adding -p. More about those options
-p,--partialLoad allow partial load of file that contains corrupt line
-s,--stopOnFirstError stop process if the dataset contains a corrupt file
Upvotes: 0