Representing transactions/time in RDF

Question

I need to represent electronic health records in RDF. This kind of data is time dependent. So, I want to represent them as events. I want to use something similar to a Datomic database. Datomic uses triples with an added transaction field. This extra field is time stamped and can have user-defined metadata. I want to use named graphs to record transaction/time data.

For instance, in the query below, I only search triples of graphs from a certain editor created on a certain date:

SELECT ?name ?mbox ?date
WHERE {
    ?g dc:publisher ?name ;
       dc:date ?date .
    GRAPH ?g
    { ?person foaf:name ?name ; foaf:mbox ?mbox }
}

Queries like this one would solve my problem. My concerns are:

I will end up with millions of named graphs. Will they make the SPARQL queries too slow?
The triple store I am using, Blazegraph, has support for inference (entailments) but states that: "Bigdata does not support inference in the quads mode out of the box." Which triple stores do support inference using quads (named graphs)?
Is there a better way to represent this kind of data in RDF? Some kind of best practices guideline?

Jeen Broekstra · Accepted Answer

I will end up with millions of named graphs. Will they make the SPARQL queries too slow?

Generally speaking, not necessarily, at least not anymore than adding millions of triples in one named graph. But it really depends on your triplestore, and how good it is at indexing on named graphs.

The triple store I am using, Blazegraph, has support for inference (entailments) but states that: "Bigdata does not support inference in the quads mode out of the box." Which triple stores do support inference using quads (named graphs)?

StackOverflow is not really the right platform to ask for tool recommendations - I suggest you google around a bit instead to see feature lists of the various available triplestores.

I also suspect that at the scale you need, inferencing performance might disappoint you (again, depending on the implementation of course). Are you sure you need inferencing? Not saying you definitely shouldn't, but depending on the expressivity of the inference you need, there are quite often ways around by being a bit creative in terms of querying.

Is there a better way to represent this kind of data in RDF? Some kind of best practices guideline?

It looks like a sensible approach to me. Whether another way is better is hard to judge without knowing more about the way you intend to use this data, the scale (in number of triples), etc. As for best practices: this W3C note on N-Ary relations in RDF is a good resource. Also: How can I express additional information (time, probability) about a relation in RDF? .

Representing transactions/time in RDF

Answers (1)

Related Questions