Reputation: 2514
The specification for RDF N-Triples states that string literals must be encoded.
https://www.w3.org/TR/n-triples/#grammar-production-STRING_LITERAL_QUOTE
Does this "encoding" have a name I can look up to use it in my programming language? If not, what does it mean in practice?
Upvotes: 2
Views: 1632
Reputation: 378
You could use Literal#n3()
e.g.
# pip install rdflib
>>> from rdflib import Literal
>>> lit = Literal('This "Literal" needs escaping!')
>>> s = lit.n3()
>>> print(s)
"This \"Literal\" needs escaping!"
Upvotes: 3
Reputation: 9498
In addition to Josh's answer. It is almost always a good idea to normalize unicode data to NFC,e.g. in Java you can use the following routine
java.text.Normalizer.normalize("rdf literal", Normalizer.Form.NFKC);
For more information see: http://www.macchiato.com/unicode/nfc-faq
What is NFC?
For various reasons, Unicode sometimes has multiple representations of the same character. For example, each of the following sequences (the first two being single-character sequences) represent the same character:
U+00C5 ( Å ) LATIN CAPITAL LETTER A WITH RING ABOVE U+212B ( Å ) ANGSTROM SIGN U+0041 ( A ) LATIN CAPITAL LETTER A + U+030A ( ̊ ) COMBINING RING ABOVE
These sequences are called canonically equivalent. The first of these forms is called NFC - for Normalization Form C, where the C is for compostion. For more information on these, see the introduction of UAX #15: Unicode Normalization Forms. A function transforming a string S into the NFC form can be abbreviated as toNFC(S), while one that tests whether S is in NFC is abbreviated as isNFC(S).
Upvotes: 2
Reputation: 85813
The grammar productions that you need are right in the document that you linked to:
[9] STRING_LITERAL_QUOTE ::= '"' ([^#x22#x5C#xA#xD] | ECHAR | UCHAR)* '"'
[141s] BLANK_NODE_LABEL ::= '_:' (PN_CHARS_U | [0-9]) ((PN_CHARS | '.')* PN_CHARS)?
[10] UCHAR ::= '\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX HEX
[153s] ECHAR ::= '\' [tbnrf"'\]
This means that a string literal begins and ends with a double quote ("). Inside of the double quotes, you can have:
Upvotes: 4