What is the benefit of defining datatypes for literals in an RDF graph?

Question

I am using rdflib in Python to build my first rdf graph. However, I do not understand the explicit purpose of defining Literal datatypes. I have scraped over the documentation and did my due diligence with google and the stackoverflow search, but I cannot seem to find an actual explanation for this. Why not just leave everything as a plain old Literal?

From what I have experimented with, is this so that you can search for explicit terms in your Sparql query with BIND? Does this also help with FILTERing? i.e. FILTER (?var1 > ?var2), where var1 and var2 should represent integers/floats/etc? Does it help with querying speed? Or am I just way off altogether?

Specifically, why add the following triple to mygraph

mygraph.add((amazingrdf, ns['hasValue'], Literal('42.0', datatype=XSD.float)))

instead of just this?

mygraph.add((amazingrdf, ns['hasValue'], Literal("42.0")))

I suspect that there must be some purpose I am overlooking. I appreciate your help and explanations - I want to learn this right the first time! Thanks!

cygri · Accepted Answer

Comparing two xsd:integer values in SPARQL:

ASK { FILTER (9 < 15) }

Result: true. Now with xsd:string:

ASK { FILTER ("9" < "15") }

Result: false, because when sorting strings, 9 comes after 1.

Some equality checks with xsd:decimal:

ASK { FILTER (+1.000 = 01.0) }

Result is true, it’s the same number. Now with xsd:string:

ASK { FILTER ("+1.000" = "01.0") }

False, because they are clearly different strings.

Doing some maths with xsd:integer:

SELECT (1+1 AS ?result) {}

It returns 2 (as an xsd:integer). Now for strings:

SELECT ("1"+"1" AS ?result) {}

It returns "11" as an xsd:string, because adding strings is interpreted as string concatenation (at least in Jena where I tried this; in other SPARQL engines, adding two strings might be an error, returning nothing).

As you can see, using the right datatype is important to communicate your intent to code that works with the data. The SPARQL examples make this very clear, but when working directly with an RDF API, the same kind of issues crop up around object identity, ordering, and so on.

As shown in the examples above, SPARQL offers convenient syntax for xsd:string, xsd:integer and xsd:decimal (and, not shown, for xsd:boolean and for language-tagged strings). That elevates those datatypes above the rest.

What is the benefit of defining datatypes for literals in an RDF graph?

Answers (1)

Related Questions