Reputation: 177
I'm developing a TEI P5 XML text edition that requires that abbreviations include <am>
elements to signal abbreviation markers and <ex>
elements to indicate the expanded matter of these abbreviations within their respective <abbr>
and <expan>
environments. Now I am investigating the feasibility of encoding <w>
word elements as well, thus:
<w>
<choice>
<abbr>þa<am>&combmacr;</am></abbr>
<expan>þa<ex>m</ex></expan>
</choice>
</w>
The syntax is straightforward as long as the <choice>
environment encodes one word and one word only. However, I'm worried multiword abbreviations cannot straightforwardly be combined with word elements, as whichever of <w>
or <ex>
comes last needs to be closed before the former is. Thus the <expan>
line in the following won't validate:
<choice>
<abbr><w>L<am>&baracr;</am></w></abbr>
<expan><w>L<ex>EOFAN</w> <w>MEN</w></ex></expan>
</choice>
I cannot contain the full <ex>
element within a single <w>
or both <w>
elements within <ex>
because the first letter of the first word (in this example) is in the manuscript and thus does not count as an editorial expansion. Is there any way around this?
NB My reason for wanting to encode words is not to lemmatize but to be able to encode manuscript word spacing alongside lexical word spacing: by having my XSL strip or preserve space in <w>
elements, I can choose to display word spacing either by modern standards or as presented in the manuscript by way of a parameter. This is also why it would be undesirable to enclose the entire <choice>
environment in a single <w>
tag, unless perhaps in any multiword expansions I add in word spacing using a special character to be translated into a regular space by the XSL styhesleet (a non-breaking space perhaps?). Is that what I should be looking at? (Using XSLT 2.0.)
Upvotes: 1
Views: 88
Reputation: 101
You could also put the <choice>
inside a <w>
, to indicate that you are tokenizing as a single word something which is a single token in abbreviated form, but multiple tokens in expanded form.
Upvotes: 0
Reputation: 1458
Yes, I think too that <w>
inside of <choice>
makes sense here. As you demonstrated, one abbreviation can unfold in more than one word. Consequently, it would be right to have multiple <ex>
as well. Why not encode it like this:
<expan><w>L<ex>EOFAN</ex></w> <w><ex>MEN</ex></w></expan>
However, I was thinking of another issue. You said that you need <w>
only to "encode manuscript word spacing alongside lexical word spacing". As <expan>
is definitively not on manuscript level, but only on interpretative level, why would you need to separate <w>
here at all? In the end, would it be sufficient to give the expansion simply within one <w>
? The space between LEOFAN and MEN is editorial anyway. So why not try this as well:
<expan><w>L<ex>EOFAN MEN</ex></w></expan>
Upvotes: 0