Reputation: 378
I am running an SPARQL* query against Jena's TDB where the result set (DBPedia logs) contains escpace characters. To run the query I use org.apache.jena.query.QueryExecution like the following:
query = "SELECT *
WHERE
{ << ?s <http://dbpedia.org/property/cover> ?o >>
vers:valid_from ?valid_from ;
vers:valid_until ?valid_until
BIND("2022-01-05T21:42:11.803+02:00"^^xsd:dateTime AS ?TimeOfExecution)
FILTER ( ( ?valid_from <= ?TimeOfExecution ) && ( ?TimeOfExecution < ?valid_until ) )
}"
conn = RDFConnection.connect(String.format("http://localhost:%d/in_memory_server/sparql", server.getHttpPort()));
QueryExecution qExec = conn.query(query);
and I get the following exception:
Exception in thread "main" org.apache.jena.atlas.json.JsonParseException: illegal escape sequence value: f (0x66)
at org.apache.jena.atlas.json.io.parser.TokenizerJSON.exception(TokenizerJSON.java:757)
at org.apache.jena.atlas.json.io.parser.TokenizerJSON.exception(TokenizerJSON.java:749)
at org.apache.jena.atlas.json.io.parser.TokenizerJSON.readLiteralEscape(TokenizerJSON.java:669)
at org.apache.jena.atlas.json.io.parser.TokenizerJSON.allBetween(TokenizerJSON.java:559)
at org.apache.jena.atlas.json.io.parser.TokenizerJSON.parseToken(TokenizerJSON.java:138)
at org.apache.jena.atlas.json.io.parser.TokenizerJSON.hasNext(TokenizerJSON.java:75)
at org.apache.jena.atlas.iterator.PeekIterator.fill(PeekIterator.java:50)
at org.apache.jena.atlas.iterator.PeekIterator.next(PeekIterator.java:92)
at org.apache.jena.atlas.json.io.parser.JSONParserBase.nextToken(JSONParserBase.java:102)
at org.apache.jena.atlas.json.io.parser.JSONP.parseObject(JSONP.java:75)
at org.apache.jena.atlas.json.io.parser.JSONP.parseAny(JSONP.java:97)
at org.apache.jena.atlas.json.io.parser.JSONP.parseObject(JSONP.java:79)
at org.apache.jena.atlas.json.io.parser.JSONP.parseAny(JSONP.java:97)
at org.apache.jena.atlas.json.io.parser.JSONP.parseArray(JSONP.java:143)
at org.apache.jena.atlas.json.io.parser.JSONP.parseAny(JSONP.java:98)
at org.apache.jena.atlas.json.io.parser.JSONP.parseObject(JSONP.java:79)
at org.apache.jena.atlas.json.io.parser.JSONP.parseAny(JSONP.java:97)
at org.apache.jena.atlas.json.io.parser.JSONP.parseObject(JSONP.java:79)
at org.apache.jena.atlas.json.io.parser.JSONP.parse(JSONP.java:50)
at org.apache.jena.atlas.json.io.parser.JSONParser.parse(JSONParser.java:58)
at org.apache.jena.atlas.json.io.parser.JSONParser.parse(JSONParser.java:40)
at org.apache.jena.atlas.json.JSON._parse(JSON.java:126)
at org.apache.jena.atlas.json.JSON.parse(JSON.java:38)
at org.apache.jena.riot.resultset.rw.ResultSetReaderJSON$RS_JSON.parse(ResultSetReaderJSON.java:103)
at org.apache.jena.riot.resultset.rw.ResultSetReaderJSON.process(ResultSetReaderJSON.java:74)
at org.apache.jena.riot.resultset.rw.ResultSetReaderJSON.readAny(ResultSetReaderJSON.java:67)
at org.apache.jena.riot.resultset.rw.ResultsReader.readAny(ResultsReader.java:167)
at org.apache.jena.riot.resultset.rw.ResultsReader.readAny(ResultsReader.java:152)
at org.apache.jena.riot.ResultSetMgr.readAny(ResultSetMgr.java:191)
at org.apache.jena.riot.ResultSetMgr.read(ResultSetMgr.java:113)
at org.apache.jena.sparql.exec.http.QueryExecHTTP.execRowSet(QueryExecHTTP.java:195)
at org.apache.jena.sparql.exec.http.QueryExecHTTP.select(QueryExecHTTP.java:156)
at org.apache.jena.sparql.exec.QueryExecutionAdapter.execSelect(QueryExecutionAdapter.java:117)
at org.apache.jena.sparql.exec.QueryExecutionCompat.execSelect(QueryExecutionCompat.java:97)
at org.ai.wu.ac.at.tdbArchive.core.JenaTDBArchive_TB_star_f.materializeQuery(JenaTDBArchive_TB_star_f.java:295)
at org.ai.wu.ac.at.tdbArchive.core.JenaTDBArchive_TB_star_f.bulkAllMatQuerying(JenaTDBArchive_TB_star_f.java:258)
at org.ai.wu.ac.at.tdbArchive.tools.JenaTDBArchive_query.main(JenaTDBArchive_query.java:266)
The ?o of one row contains following problematic string:
"{\
tf1\ansi\ansicpg1252{\onttbl}
{\colortbl;\
ed255\green255\lue255;"@en
Is there any property I can set to circumvent these escape characters or to tell Jena or ARQ to use a different parser?
For some reason I do not have this problem when it is a SPARQL and not a SPARQL* query. Can this make a difference? E.g. when I run following SPARQL query, which delivers exactly the same result, just from a .ng RDF dataset (quads), I get no exception:
Select * WHERE
{ GRAPH <http://example.org/versions>
{ ?graph <http://www.w3.org/2002/07/owl#versionInfo> 92 }
GRAPH ?graph
{ ?s <http://dbpedia.org/property/cover> ?o }
}
UPDATE 1: The issues lies within the RDF dataset serialized as .ttl. To create the RDF dataset I use python. The script takes an initial snapshot and changesets as input and builds a new RDF dataset with all the changes/versions included. I use following snippet to parse and serialize the changesets:
from rdflib import Graph
cs_add = Graph() cs_add.parse("path_to_changeset")
The issue seems to be in the parser. The string that gets parsed is:
"{\\rtf1\\ansi\\ansicpg1252{\\fonttbl}\n{\\colortbl;\\red255\\green255\\blue255;"@en
Now I want to serialize this string AS-IS. I want to preserve all the special characters and they should not be escaped. This is what I get when i iterate through the triples and print the object Literal:
for s, p, o in cs_add:
if "ansicpg1252" in o: # just to catch the string
print(o.encode('utf-8'))
print()
print(o.n3())
Output
b'{\\\rtf1\\ansi\\ansicpg1252{\\\x0conttbl}\n{\\colortbl;\\\red255\\green255\\\x08lue255;'
"""{\\\rtf1\\ansi\\ansicpg1252{\\onttbl}
{\\colortbl;\\\red255\\green255\lue255;"""@en
So we see that e.g. a third backslash is added. Now I would need to find an encoding that somehow preserves the string as it is.
Upvotes: 0
Views: 164
Reputation: 378
The problem seems not to be related to the Java Jena API but to python's rdflib. I will open another question about the specific issue with rdflib.
Upvotes: 0