Paulo Moura
Paulo Moura

Reputation: 18663

Escaping special characters in index directives

I'm generating documentation for a Prolog system implementation using Sphinx. The Prolog language includes conjunction and disjunction control constructs that are represented by, respectively, the compound terms (',')/2 and (;)/2.

But the following index directives don't generate correct entries due to the presence of the comma and the semicolon:

.. index:: (',')/2

.. index:: (;)/2

I have been unable so far to find a character escaping solution. I also have the same problem with the Prolog !/0 control construct but there I found a workaround by writing:

.. index:: !!/0

Tried to use a backslash to no avail. Is there any support for escaping special characters in directives that I'm missing? Is there any alternative solution to have (',')/2, (;)/2, and !/0 index entries?

Upvotes: 3

Views: 411

Answers (2)

KaKkouo
KaKkouo

Reputation: 41

< (;)/2 >

(;)/2 depends on split_into function which is used by IndexEntries class.

  • I guess: value.split(';', n - 1) -> value.split('; ', n - 1)
  • I put a comment on the #8904.
  • It seems easy for you to rewrite this code.

sphinx/util/init.py ( Sphinx 4.2.0 )

365
366 def split_into(n: int, type: str, value: str) -> List[str]:
367     """Split an index entry into a given number of parts at semicolons."""
368     parts = [x.strip() for x in value.split(';', n - 1)]
369     if sum(1 for part in parts if part) < n:
370         raise ValueError('invalid %s index entry %r' % (type, value))
371     return parts
372

< !!/0 >

!!/0 depends on process_index_entry function and an other which are used by index directive/role.

  • A workaround is to use !!!/0

sphinx/util/nodes.py ( Sphinx 4.2.0 )

363
364 def process_index_entry(entry: str, targetid: str
365                         ) -> List[Tuple[str, str, str, str, Optional[str]]]:
366     from sphinx.domains.python import pairindextypes
367
368     indexentries: List[Tuple[str, str, str, str, Optional[str]]] = []
369     entry = entry.strip()
370     oentry = entry
371     main = ''
372     if entry.startswith('!'):
373         main = 'main'
374         entry = entry[1:].lstrip()
375     for type in pairindextypes:
376         if entry.startswith(type + ':'):
377             value = entry[len(type) + 1:].strip()
378             value = pairindextypes[type] + '; ' + value
379             indexentries.append(('pair', value, targetid, main, None))
380             break
381     else:

sphinx/domains/index.py ( Sphinx 4.2.0 )

 62
 63 class IndexDirective(SphinxDirective):
 64     """
 65     Directive to add entries to the index.
 66     """
...
 90         for entry in arguments:
 91             indexnode['entries'].extend(process_index_entry(entry, targetnode['ids'][0]))
 92         return [indexnode, targetnode]
...
 94
 95 class IndexRole(ReferenceRole):
 96     def run(self) -> Tuple[List[Node], List[system_message]]:
...
102         else:
103             # otherwise we just create a single entry
104             if self.target.startswith('!'):
105                 title = self.title[1:]
106                 entries = [('single', self.target[1:], target_id, 'main', None)]
107             else:

Other

The ideal, but hard, solution would be to develop sphinx/domains/prolog.py.

Upvotes: 3

bad_coder
bad_coder

Reputation: 12890

The .. index:: is a specialized Sphinx directive, not a standard reST directive.

I think this is a Sphinx bug, notice the following example using only a semicolon would break if you don't use the name: option:

.. index::
    single: ;
    name: aa

Gives the following HTML:

<li><a href="my_index.html#index-2">;</a></li>

The two examples using the name: option

.. index::
    single: (',')/2
    name: aaa

.. index::
    single: !!/0
    name: aaaaa

Give the following HTML:

<li><a href="my_index.html#index-5">!!/0</a></li>
<li><a href="my_index.html#index-3">(&#39;,&#39;)/2</a></li>

But now if we use

.. index::
    single: (;)/2
    name: a

It gives this HTML:

      <li>
    (

      <ul>
        <li><a href="my_index.html#index-1">)/2</a>
</li>
      </ul></li>

So it's probably a bug in parsing the names, there's no reason why the semicolon should cause the introduction of an additional <ul> pair in the middle of the <li>.

Initially one would tend to blame the use of symbols in the names and try to escape them. Next you'd question if the semicolon itself is permitted since those fields are likely to be used in HTML thus subject to Identifier normalization of class names and identifiers keys. Looking at the docutils spec:

ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").

But that is disproved by the simple fact that using the name: and single: options solved two cases - one of them with the semicolon isolated. The name: option itself is recent, see issue #1671. Looking at issue #7031 the permitted characters have also undergone a recent change. And finally there has been a suspiciously similar semicolon issue #8405 recently...

Side note:

Since this is a Prolog thread I'll mention "Grammar production displays" might offer something for your documentation using .. productionlist:: directive and the :token: role. I haven't seen it being used. Apparently it only needs to copy a (BNF) grammar of Prolog.

Upvotes: 2

Related Questions