Mulone
Mulone

Reputation: 3663

Python and rdflib: parsing issues

I am using rdflib (package python-rdflib 2.4.2-3, Python 2.7.5+, Ubuntu 13.10) with immense frustration. I am simply trying to load two NT files into a local triple store. This is the code:

from rdflib import Graph

graph = Graph('Sleepycat')
rt = graph.open(fn)
# everything is fine, triplestore open.

# try to parse some files
ex = "http://dbpedia.org/data3/Place.ntriples"
# ex = "http://dbpedia.org/data3/Place.n3" # another option
# none of the following works
g.parse( ex )
g.parse( ex, "n3" )
g.parse( ex, "nt" )
g.parse( ex, "ntriple" )
g.parse( ex, "thisisrubbish" )

This code always raises xml.sax._exceptions.SAXParseException: http://dbpedia.org/data3/Place.ntriples:1:6: not well-formed (invalid token). It is obvious that parse is defaulting to the RDF format, trying to parse the text as XML (and failing). As the last line shows, the code doesn't check if the format exists. It just ignores it.

Another irritating aspect is that parse seems to delete everything from the graph, which is not behaviour described in the documentation:

graph = Graph('Sleepycat')
graph.open("somewhere.db")

graph.parse(input1) # graph contains input1
graph.parse(input2) # graph contains only input2, but should contain input1+input2.

I have to admit that the library looks too buggy to be usable.

Any idea on how to debug this and/or alternatives in Python?

Mulone

Upvotes: 4

Views: 2230

Answers (1)

Ted Lawless
Ted Lawless

Reputation: 860

You need to use the format keyword when calling parse:

g.parse( ex, format="nt" )

As you noticed parse will default to RDF/XML

I would also recommend upgrading from rdflib 2.4.2-3 to the latest version, 4.1.2. You can get this with easy_install or pip. The docs you reference are for version 4+.

Upvotes: 3

Related Questions