Reputation: 85
Should I remove duplicate triples from my RDF file? For example, I have these blocks within a file:
<http://Group/row1>
vocab:regione Campania ;
vocab:nome Napoli ;
vocab:codice NA .
and
<http://Group/row1>
vocab:nome Napoli ;
vocab:codice NA .
The triples in the second block all also appear within the first block. Should the second block be removed from the file?
Upvotes: 0
Views: 862
Reputation: 85883
RDF is graph based representation, and a graph (in this sense) is a set of edges. Sets, by definition, don't have duplicate elements. Of course, a specific serialization of an RDF graph could depict the same triple more than once, and there might be reasons that you would want to avoid that. As a note about terminology, the thing that you've called "Triple 1" is actually three triples:
group:row1 vocab:codice "NA" .
group:row1 vocab:nome "Napoli".
group:row1 vocab:regione "Campania".
and what you've called "Triple 2" is actually two triples:
group:row1 vocab:codice "NA" .
group:row1 vocab:nome "Napoli".
At any rate: (i) it shouldn't actually be a problem that you have the same triples represented multiple times in your data; (ii) if you want to remove it, then reading in the graph (with just about any RDF processing tool) and writing it out again should give you a representation without duplicated information. For instance, suppose you have the following as data.rdf
.
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:group="http://stackoverflow.com/q/23241612/1281433/group/"
xmlns:vocab="http://stackoverflow.com/q/23241612/1281433/vocab/">
<rdf:Description rdf:about="http://stackoverflow.com/q/23241612/1281433/group/row1">
<vocab:regione>Campania</vocab:regione>
<vocab:nome>Napoli</vocab:nome>
<vocab:codice>NA</vocab:codice>
</rdf:Description>
<rdf:Description rdf:about="http://stackoverflow.com/q/23241612/1281433/group/row1">
<vocab:nome>Napoli</vocab:nome>
<vocab:codice>NA</vocab:codice>
</rdf:Description>
</rdf:RDF>
Here's what you get when you read it in with Jena's rdfcat
and write it out again:
$ rdfcat data.rdf
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:group="http://stackoverflow.com/q/23241612/1281433/group/"
xmlns:vocab="http://stackoverflow.com/q/23241612/1281433/vocab/">
<rdf:Description rdf:about="http://stackoverflow.com/q/23241612/1281433/group/row1">
<vocab:regione>Campania</vocab:regione>
<vocab:nome>Napoli</vocab:nome>
<vocab:codice>NA</vocab:codice>
</rdf:Description>
</rdf:RDF>
Upvotes: 5