Reputation: 10102
As far as I know, the semantic web consists of triples of URIs. Namespace shorthands are widely used to abbreviate them in daily use. I thought, namespace shorthands would be expanded to URIs by simple concatenation, e.g. the famous dc:title
in the well-known dc:
namespace (which is defined as http://purl.org/dc/elements/1.1/
, note that the last character is a /
) would be expanded to, and hence be semantically equal to http://purl.org/dc/elements/1.1/title
.
Then I came over some namespace definitions which lack a sensible separation character at their end. Some examples from http://live.dbpedia.org/sparql?nsdecl
and some from the Most common RDF namespaces list:
How to expand such namespaces into valid linked data URIs?
The W3C Recommendation Namespaces in XML defines:
An expanded name is a pair consisting of a namespace name and a local name.
And Fredrik Lundh writes on effbot.org:
In an Element tree, qualified names are stored as universal names in Clark’s notation, which combines the URI and the local part into a single string, given as ‘{uri}local’.
This may be suitable for a wide range of use cases, but it doesn’t conform to the idea that linked data constists of URIs, which cannot start with a {
.
I would have thought that xsd:element
should not be expanded to http://www.w3.org/2001/XMLSchemaelement
in linked data (nor to {http://www.w3.org/2001/XMLSchema}element
), should it? How must this be implemented correctly?
Upvotes: 3
Views: 912
Reputation: 13177
There are basically two main reasons for this discrepancy:
A namespace in XML is an individual resource, identified by its URI. It is actually a collection of at least 3 distinct kinds of resources: elements, global attributes (used with a prefix), and local attributes (separately for each element, without a prefix).
An XML namespace does not have to consist of just elements or attributes, and in general, no resource in it needs to be identifiable as a URI. It is up to the namespace to define any resources and assign any QNames as it sees fit (as an example, XPath errors does not even define any elements or attributes).
In order to "resolve" a resource in an XML namespace, you therefore need at least 3 things: the namespace URI, the name of the resource within the namespace, and its "kind" (element, global attribute, or something else).
On the other hand, an RDF namespace is "virtual" ‒ it arises implicitly from resources having the same prefix; there is no need to define it, and so its URI does not actually have to identify any resource (though it is of course desirable to identify the RDF vocabulary with the common URI prefix).
The RDF/XML syntax uses XML namespaces to express RDF namespaces for properties and classes, which is convenient but ultimately contradictory to how one should use XML. Such an XML namespace is simply implied from the use of any property or class in an RDF vocabulary, giving rise to both elements and attributes. It also defines a mapping to URIs for its contents, as a plain concatenation of the namespace URI and the local name of the resource.
This allows for some overlap of both plain XML and RDF/XML data and makes it look like your RDF is just XML, but any more advanced XML technology (such as XML Schema or XSLT) is generally unusable for RDF/XML. To use both side by side, you have to treat your XML namespace like RDF/XML does.
With this in mind, let's address your observations and concerns:
lid:
scheme to identify an entity identified by a QName, but it is quite cumbersome).xml:
namespace is preserved this way, for example in the RDFa Core Initial Context, but it is essentially meaningless (unless used with an empty local part).#
. This is not incorrect, but from the perspective of XML, it is completely arbitrary.#
, /
, :
, or possibly any other from gen-delims
or sub-delims
), just concatenate as usual, otherwise join the two parts with #
, unless the namespace URI already has a fragment, in which case you could use /
. This however conflates namespaces with #
at the end with those without, and does not distinct between elements, attributes or other kinds of XML resources. You could fix that by decorating attributes with @
for example, but that is inconsistent with how both XML and RDF map some properties.Upvotes: 0
Reputation: 85813
From the RDF/XML Syntax Specification (Revised) [emphasis added]:
In order to encode the graph in XML, the nodes and predicates have to be represented in XML terms — element names, attribute names, element contents and attribute values. RDF/XML uses XML QNames as defined in Namespaces in XML [XML-NS] to represent RDF URI references. All QNames have a namespace name which is a URI reference and a short local name. In addition, QNames can either have a short prefix or be declared with the default namespace declaration and have none (but still have a namespace name)
The RDF URI reference represented by a QName is determined by appending the local name part of the QName after the namespace name (URI reference) part of the QName. This is used to shorten the RDF URI references of all predicates and some nodes. RDF URI references identifying subject and object nodes can also be stored as XML attribute values. RDF literals, which can only be object nodes, become either XML element text content or XML attribute values.
It is simple concatenation. It's the concatenated result that matters. This means that I can use
@prefix dcterms: <http://purl.org/dc/terms/>
@prefix dctermsx: <http://purl.org/dc/terms/accrual>
dcterms:accrualPolicy === http://purl.org/dc/terms/accrualPolicy
dctermsx:Policy === http://purl.org/dc/terms/accrualPolicy
dcterms:accrualPeriodicity === http://purl.org/dc/terms/accrualPeriodicity
dctermsx:Periodicity === http://purl.org/dc/terms/accrualPeriodicity
It's interesting that the RDF/XML syntax specification has to define how QNames are interpreted. Why didn't it just inherit the meaning from the XML QName specifications? The answer is in the article that you cited:
The XML Namespaces specification doesn’t explicitly state how an application should treat the (URI, local part) pair. While most applications treat them as two distinct components, some applications expect you to combine them in different ways.
In RDF/XML, applications treat the (URI,local part) pair as a reference to the URI that is the concatenation of uri and local, as stated in the initial quotation from the RDF syntax document. The convention, of course, is that URIs defined by a vocabulary are such that there is a common namespace and that the terms are easy to write using that namespace as an XML prefix, so in practice you won't see the sort of namespace mangling that I showed above with the DCMI terms.
In ElementTree, the QName corresponds to {uri}local. That's how that application treats the (URI,local part) pair.
There are complications that arise from the fact that RDF/XML serializations have to be valid XML. Not every URI can be represented as a QName, because there are URIs that cannot be represented as a QName, because in a QName namespace:localname
, there are restrictions on what characters can appear in namespace
and in name
. For instance,http://127.0.0.1/789234
, you can't have the nice QName like localhost:789234
for it because the localname cannot start with with a number. (For instance, see this thread on the Jena-users mailing list.)
Another complication or confusion arises from the fact that there are RDF serializations other than RDF/XML, and some of these adopt a prefix/suffix notation that is superficially similar to XML QNames, but relaxes some of these constraints, so you may see prefix/suffix combinations that wouldn't be valid XML QNames, but that's OK for those formats.
The prefixes defined on the DBpedia SPARQL endpoint highlight this issue. From the SPARQL standard, section 4.1.1.1 Prefixed Names [emphasis added]:
The
PREFIX
keyword associates a prefix label with an IRI. A prefixed name is a prefix label and a local part, separated by a colon":"
. A prefixed name is mapped to an IRI by concatenating the IRI associated with the prefix and the local part. The prefix label or the local part may be empty. Note that SPARQL local names allow leading digits while XML local names do not. SPARQL local names also allow the non-alphanumeric characters allowed in IRIs via backslash character escapes (e.g.ns:id\=123
). SPARQL local names have more syntactic restrictions than CURIEs.
In this context, while a prefix like
amz => http://webservices.amazon.com/AWSECommerceService/2005-10-05
would be useless in an RDF/XML serialization, because you'd need to write illegal things like amz:#something
or amz:/something
, it would be useful (if possibly inconvenient) in SPARQL, where you can write amz:\#something
and amz:\/something
.
Upvotes: 5