Trying to load Wikidata truthy-latest.nt with tdb2.tdbloader results in Code: 58/PROHIBITED_COMPONENT_PRESENT in USER

Question

With Apache Jena Fuseki I am trying to load the latest-truthy.nt dataset from Wikidata, but I am getting the following error while trying to import the file. With the inspiration from the following success from Bitplan where they did have success.

Error log:

14:36:16 INFO  loader          :: Add: 198.500.000 latest-truthy.nt (Batch: 453.309 / Avg: 213.382)
14:36:17 ERROR riot            :: [line: 198884173, col: 87] Bad IRI:  Code: 58/PROHIBITED_COMPONENT_PRESENT in USER: A component that is prohibited by the scheme is present.
org.apache.jena.riot.RiotException: [line: 198884173, col: 87] Bad IRI:  Code: 58/PROHIBITED_COMPONENT_PRESENT in USER: A component that is prohibited by the scheme is present.
    at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error(ErrorHandlerFactory.java:146)
    at org.apache.jena.riot.system.ParserProfileStd.internalMakeIRI(ParserProfileStd.java:112)
    at org.apache.jena.riot.system.ParserProfileStd.resolveIRI(ParserProfileStd.java:85)
    at org.apache.jena.riot.system.ParserProfileStd.createURI(ParserProfileStd.java:187)
    at org.apache.jena.riot.system.ParserProfileStd.create(ParserProfileStd.java:259)
    at org.apache.jena.riot.lang.LangNTriples.tokenAsNode(LangNTriples.java:70)
    at org.apache.jena.riot.lang.LangNTuple.parseTriple(LangNTuple.java:109)
    at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:61)
    at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:53)
    at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
    at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:184)
    at org.apache.jena.riot.RDFParser.read(RDFParser.java:357)
    at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:323)
    at org.apache.jena.riot.RDFParser.parse(RDFParser.java:298)
    at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:550)
    at org.apache.jena.tdb2.loader.base.LoaderOps.inputFile(LoaderOps.java:107)
    at org.apache.jena.tdb2.loader.base.LoaderBase.loadOne(LoaderBase.java:125)
    at org.apache.jena.tdb2.loader.base.LoaderBase.lambda$load$0(LoaderBase.java:102)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
    at org.apache.jena.tdb2.loader.base.LoaderBase.load(LoaderBase.java:99)
    at tdb2.tdbloader.lambda$execBulkLoad$4(tdbloader.java:196)
    at org.apache.jena.atlas.lib.Timer.time(Timer.java:85)
    at tdb2.tdbloader.execBulkLoad(tdbloader.java:194)
    at tdb2.tdbloader.loadQuads(tdbloader.java:175)
    at tdb2.tdbloader.exec(tdbloader.java:136)
    at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
    at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
    at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
    at tdb2.tdbloader.main(tdbloader.java:64)

Script to import:

@ECHO off
cd apache-jena-4.0.0
echo start import on %DATE% %TIME%

tdb2_tdbloader --loader=parallel --loc "C:\fuseki\data" "F:\latest-truthy.nt" > tdb2-out.log 2> tdb2-err.log

echo finish import on %DATE% %TIME%
pause

File structure:

- C:/fuseki/
-- apache-jena-4.0.0/
-- apache-jena-fuseki-4.0.0/
-- data/
-- startfusekidb.bat
-- wikidata2fuseki.bat

- F:/
-- latest-truthy.nt

Is this an issue with Fuseki? I can't open the .nt file myself to remove the issue. Is there any flags I can use so it skips validation for the given import with tdbloader?

I am also asking this in the IRC channel of Wikidata to see if they might be able to help me.

UPDATE: I got answer from someone at IRC and they told me a whole lot of errors exist in the dataset Errors in Wikidata So I know need to find a way to skip error related lines and continue loading. But the Fuseki TDB2 Commands don't show anything of help.

Also trying --help outputs the following, thus indicating skipping doesn't exist?

c:\fuseki\apache-jena-4.0.0\bin>tdb2_tdbloader -h
tdbloader--loader= [--desc DATASET | --loc DIR] FILE ...
  Location
      --loc=DIR              Location (a directory)
      --tdb=                 Assembler description file
      --graph=IRI            Act on a named graph
      --loader=              Loader to use: 'basic', 'phased' (default), 'sequential', 'parallel' or 'light'
      --syntax=LANG          Syntax of data from stdin
  Symbol definition
      --set                  Set a configuration symbol to a value
      --mem=FILE             Execute on an in-memory TDB database (for testing)
      --desc=                Assembler description file
  General
      -v   --verbose         Verbose
      -q   --quiet           Run with minimal output
      --debug                Output information for debugging
      --help
      --version              Version information
      --strict               Operate in strict SPARQL mode (no extensions of any kind)

Trying to load Wikidata truthy-latest.nt with tdb2.tdbloader results in Code: 58/PROHIBITED_COMPONENT_PRESENT in USER

Answers (1)

Related Questions