Marcelo
Marcelo

Reputation: 436

Text from RDF with RDFlib in Python

I have an rdf file, for example:

<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dbp="http://dbpedia.org/ontology/"
xmlns:dbprop="http://dbpedia.org/property/"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <rdf:Description rdf:about="http://dbpedia.org/page/Johann_Sebastian_Bach">
      <dbp:birthDate>1685-03-21</dbp:birthDate>
      <dbp:deathDate>1750-07-28</dbp:deathDate>
      <dbp:birthPlace>Eisenach</dbp:birthPlace>
      <dbp:deathPlace>Leipzig</dbp:deathPlace>
      <dbprop:shortDescription>German composer and organist</dbprop:shortDescription>
      <foaf:name>Johann Sebastian Bach</foaf:name>
      <rdf:type rdf:resource="http://dbpedia.org/class/yago/GermanComposers"/>
      <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
    </rdf:Description>
</rdf:RDF> 

and I'd like to extract only the textual parts of this file, i.e., my output in this case would be:

output_ tex = "Johann Sebastian Bach, German composer and organist,1685-03-21, 1750-07-28, Eisenach, Leipzig"

How can I get this result using RDFlib?

Upvotes: 3

Views: 4620

Answers (2)

Ted Lawless
Ted Lawless

Reputation: 860

Building on Joshua Taylor's answer, the method you are looking for is "toPython" which the docs say " Returns an appropriate python datatype derived from this RDF Literal ". This snippet should return what you are looking for:

raw_data = """<?xml version="1.0"?>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dbp="http://dbpedia.org/ontology/"
xmlns:dbprop="http://dbpedia.org/property/"
xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <rdf:Description rdf:about="http://dbpedia.org/page/Johann_Sebastian_Bach">
      <dbp:birthDate>1685-03-21</dbp:birthDate>
      <dbp:deathDate>1750-07-28</dbp:deathDate>
      <dbp:birthPlace>Eisenach</dbp:birthPlace>
      <dbp:deathPlace>Leipzig</dbp:deathPlace>
      <dbprop:shortDescription>German composer and organist</dbprop:shortDescription>
      <foaf:name>Johann Sebastian Bach</foaf:name>
      <rdf:type rdf:resource="http://dbpedia.org/class/yago/GermanComposers"/>
      <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
    </rdf:Description>
</rdf:RDF>"""
import rdflib
graph = rdflib.Graph()
graph.parse(data=raw_data)

output = []

for s, p, o in graph:
    if type(o) == rdflib.term.Literal:
        output.append(o.toPython())

print ', '.join(output)

Upvotes: 8

Joshua Taylor
Joshua Taylor

Reputation: 85813

This is relatively straightforward, at least in terms of the conceptual task. You need to

  • read the RDF document into an rdflib Graph
  • iterate through the statements (triples) in the graph
    • if the statement's object is a literal
    • then concatenate the lexical form of the literal into the string that you're building

I'm not much of a Python user, and so not much an RDFlib user, either, but these shouldn't be all that difficult. Getting started with RDFLib (from the RDFlib documentation) shows how you can read a graph and iterate over the triples

import rdflib

g = rdflib.Graph()
result = g.parse("http://www.w3.org/People/Berners-Lee/card")

# Iterate over triples in store and print them out.
print("--- printing raw triples ---")
for s, p, o in g:
    print((s, p, o))

Now, instead of print((s,p,o)) in that for body, you'll need to check whether o is a literal (an instance of rdflib.term.Literal). If there are literals of non-string types, you will either want to concatenate their lexical forms, or only concatenate plain literals (literals with no language type, and no datatype), the string part of literals with language tags, and the lexical form of literals whose datatype is xsd:string.

More references

Upvotes: 4

Related Questions