user3582433
user3582433

Reputation: 85

Removing duplicate triples from an RDF file

Should I remove duplicate triples from my RDF file? For example, I have these blocks within a file:

<http://Group/row1>
    vocab:regione Campania ;
    vocab:nome Napoli ;
    vocab:codice NA .

and

<http://Group/row1>
    vocab:nome Napoli ;
    vocab:codice NA .

The triples in the second block all also appear within the first block. Should the second block be removed from the file?

Upvotes: 0

Views: 862

Answers (1)

Joshua Taylor
Joshua Taylor

Reputation: 85883

RDF is graph based representation, and a graph (in this sense) is a set of edges. Sets, by definition, don't have duplicate elements. Of course, a specific serialization of an RDF graph could depict the same triple more than once, and there might be reasons that you would want to avoid that. As a note about terminology, the thing that you've called "Triple 1" is actually three triples:

group:row1  vocab:codice  "NA" .
group:row1  vocab:nome  "Napoli".
group:row1  vocab:regione "Campania".

and what you've called "Triple 2" is actually two triples:

group:row1  vocab:codice  "NA" .
group:row1  vocab:nome  "Napoli".

At any rate: (i) it shouldn't actually be a problem that you have the same triples represented multiple times in your data; (ii) if you want to remove it, then reading in the graph (with just about any RDF processing tool) and writing it out again should give you a representation without duplicated information. For instance, suppose you have the following as data.rdf.

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:group="http://stackoverflow.com/q/23241612/1281433/group/"
    xmlns:vocab="http://stackoverflow.com/q/23241612/1281433/vocab/">
  <rdf:Description rdf:about="http://stackoverflow.com/q/23241612/1281433/group/row1">
    <vocab:regione>Campania</vocab:regione>
    <vocab:nome>Napoli</vocab:nome>
    <vocab:codice>NA</vocab:codice>
  </rdf:Description>
  <rdf:Description rdf:about="http://stackoverflow.com/q/23241612/1281433/group/row1">
    <vocab:nome>Napoli</vocab:nome>
    <vocab:codice>NA</vocab:codice>
  </rdf:Description>
</rdf:RDF>

Here's what you get when you read it in with Jena's rdfcat and write it out again:

$ rdfcat data.rdf
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:group="http://stackoverflow.com/q/23241612/1281433/group/"
    xmlns:vocab="http://stackoverflow.com/q/23241612/1281433/vocab/">
  <rdf:Description rdf:about="http://stackoverflow.com/q/23241612/1281433/group/row1">
    <vocab:regione>Campania</vocab:regione>
    <vocab:nome>Napoli</vocab:nome>
    <vocab:codice>NA</vocab:codice>
  </rdf:Description>
</rdf:RDF>

Upvotes: 5

Related Questions