Reputation: 436
I have an rdf file, for example:
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dbp="http://dbpedia.org/ontology/"
xmlns:dbprop="http://dbpedia.org/property/"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<rdf:Description rdf:about="http://dbpedia.org/page/Johann_Sebastian_Bach">
<dbp:birthDate>1685-03-21</dbp:birthDate>
<dbp:deathDate>1750-07-28</dbp:deathDate>
<dbp:birthPlace>Eisenach</dbp:birthPlace>
<dbp:deathPlace>Leipzig</dbp:deathPlace>
<dbprop:shortDescription>German composer and organist</dbprop:shortDescription>
<foaf:name>Johann Sebastian Bach</foaf:name>
<rdf:type rdf:resource="http://dbpedia.org/class/yago/GermanComposers"/>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
</rdf:Description>
</rdf:RDF>
and I'd like to extract only the textual parts of this file, i.e., my output in this case would be:
output_ tex = "Johann Sebastian Bach, German composer and organist,1685-03-21, 1750-07-28, Eisenach, Leipzig"
How can I get this result using RDFlib?
Upvotes: 3
Views: 4620
Reputation: 860
Building on Joshua Taylor's answer, the method you are looking for is "toPython" which the docs say " Returns an appropriate python datatype derived from this RDF Literal ". This snippet should return what you are looking for:
raw_data = """<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dbp="http://dbpedia.org/ontology/"
xmlns:dbprop="http://dbpedia.org/property/"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
<rdf:Description rdf:about="http://dbpedia.org/page/Johann_Sebastian_Bach">
<dbp:birthDate>1685-03-21</dbp:birthDate>
<dbp:deathDate>1750-07-28</dbp:deathDate>
<dbp:birthPlace>Eisenach</dbp:birthPlace>
<dbp:deathPlace>Leipzig</dbp:deathPlace>
<dbprop:shortDescription>German composer and organist</dbprop:shortDescription>
<foaf:name>Johann Sebastian Bach</foaf:name>
<rdf:type rdf:resource="http://dbpedia.org/class/yago/GermanComposers"/>
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
</rdf:Description>
</rdf:RDF>"""
import rdflib
graph = rdflib.Graph()
graph.parse(data=raw_data)
output = []
for s, p, o in graph:
if type(o) == rdflib.term.Literal:
output.append(o.toPython())
print ', '.join(output)
Upvotes: 8
Reputation: 85813
This is relatively straightforward, at least in terms of the conceptual task. You need to
I'm not much of a Python user, and so not much an RDFlib user, either, but these shouldn't be all that difficult. Getting started with RDFLib (from the RDFlib documentation) shows how you can read a graph and iterate over the triples
import rdflib
g = rdflib.Graph()
result = g.parse("http://www.w3.org/People/Berners-Lee/card")
# Iterate over triples in store and print them out.
print("--- printing raw triples ---")
for s, p, o in g:
print((s, p, o))
Now, instead of print((s,p,o))
in that for
body, you'll need to check whether o
is a literal (an instance of rdflib.term.Literal
). If there are literals of non-string types, you will either want to concatenate their lexical forms, or only concatenate plain literals (literals with no language type, and no datatype), the string part of literals with language tags, and the lexical form of literals whose datatype is xsd:string
.
Upvotes: 4