Reputation: 61

How to do batch insert of data in GraphDB in transaction

I am trying to insert data into GraphDB, so SPARQL update query consists of statements of total size ~1M, with some DELETE and WHERE statements. I failed doing it using GraphDB REST API:

1) successfuly started tansaction POST /repositories/{repositoryID}/transactions 2) sending update request (code snippet in python)

requests.put(
    url='/repositories/{repositoryID}/transactions/{transactionID}'
    params={'update': sparql, 'action': 'ADD'}
)

getting error

<!doctype html><html lang="en"><head><title>HTTP Status 400 – Bad Request</title><style type="text/css">h1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} h2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} h3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} body {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} b {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} p {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;} a {color:black;} a.name {color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1>HTTP Status 400 – Bad Request</h1><hr class="line" /><p><b>Type</b> Exception Report</p><p><b>Message</b> Request header is too large</p><p><b>Description</b> The server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax, invalid request message framing, or deceptive request routing).</p><p><b>Exception</b></p><pre>java.lang.IllegalArgumentException: Request header is too large

This sparql statements executed successfuly in Workbench SPARQL console. But if I increase the number of data I get

java.lang.StackOverflowError

in Workbench UI.

sparql which I want to execute is of following

PREFIX time: <http://www.w3.org/2006/time#> 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
PREFIX tr: <http://www.semanticweb.org/tr#> 
PREFIX owl: <http://www.w3.org/2002/07/owl#>  
PREFIX geosparql: <http://www.opengis.net/ont/geosparql#>

DELETE { tr:ontologyVersion tr:hasTimestamp ?o . }
INSERT {
    trip:ontologyVersion a time:Instant, owl:NamedIndividual ; 
                     trip:hasTimestamp "2019-10-11 14:56:06.750130+00:00"^^xsd:dateTime . 

    <a lot of new triples>
} 
WHERE { 
    OPTIONAL { tr:ontologyVersion tr:hasTimestamp ?o . } 
}

So how to insert data into GraphDB, what is a correct way to do it?

UPDATE 1

I rewrote the code to use

requests.put(url=url, data={'update': sparql}, params={'action': 'COMMIT'})

And use sparql = "DELETE DATA {}; INSERT DATA {}". Request completed with response code 200, but the data are not in GraphDB for some reasons.

UPDATE 2

According to The rdf4j server REST API I changed requests to

requests.put(url=transaction_url, data={'update': sparql}, params={'action': 'UPDATE'}) 
requests.put(url=transaction_url, params={'action': 'COMMIT'})

And still use sparql = "DELETE DATA {}; INSERT DATA {}". Request with content type 'application/x-www-form-urlencoded' and url-encoded sparql string.

In that case I get 406 error

org.eclipse.rdf4j.http.server.ClientHTTPException: Could not read SPARQL update string from body.

Upvotes: 0

Answers (2)

SVS

Reputation: 61

Finally, I came up with the solution.

requests.put(url=transaction_url, data=sparql, params={'action': 'UPDATE'}, headers={'Conteny-Type': 'application/sparql-update'}) 
requests.put(url=transaction_url, params={'action': 'COMMIT'})

As it turned out, RDF4J transactions API expects query to be in the body as it is, without any urlencoding and 'update=' parameter name. Found it here java/org/eclipse/rdf4j/http/server/repository/transaction/TransactionController.java

Upvotes: 1

Jeen Broekstra

Reputation: 22042

There's two separate problems going on here.

The first problem is when you try to execute a sparql update as part of a transaction. The error message is "Request header is too large". This sounds to me like your request is trying to send the payload as a header field, rather than as a data payload. I think you may want to change your Python code slightly, to something like:

requests.put(
    url='/repositories/{repositoryID}/transactions/{transactionID}'
    data={'update': sparql, 'action': 'ADD'}
)

(so data instead of params)

The second problem sounds like a limitation of the Workbench UI (assuming that that is what throws the StackOverflowError), but apart from that, the way you are inserting new data is very inefficient: you're doing an optional query as part of a bulk data upload, and using a INSERT...WHERE as well.

I'd suggest using an INSERT DATA command for the bulk upload instead:

INSERT DATA {
  // ... large amount of triples
}

If this thing you're doing with the timestamp should be part of what you're trying to achieve, I suggest you do a query and update of that timestamp as separate operations before or after the bulk upload - if you do it on the same transaction, the end result will be the same.

Oh and of course, once you're done, you'll need to commit your transaction.

Upvotes: 2

How to do batch insert of data in GraphDB in transaction

Answers (2)

Related Questions