Where is the actual character set for Ada program text defined?

Question

I'm trying to make a tree-sitter parser, so that IDEs (in this case, Vim) can parse and do more advanced manipulation of Ada program text, such as extract-subprogram and rename-variable. But there seem to be some problems defining the character set.

In the Ada 2012 Reference Manual, I found a list of vague category descriptions, of the form 'Any character whose General Category is X' which means that for instance, besides the underscore, all of these ( ‿ ⁀ ⁔ ︳︴﹍﹎﹏＿) are also allowed in an identifier, which seems absurd, and GNAT rejects with 'illegal character'. The list is prefaced by this statement:

"The actual set of graphic symbols used by an implementation for the visual representation of the text of an Ada program is not specified."

Does that really mean there's no way to know which characters should be accepted?

Two pages on, these examples are explicitly given as valid identifiers, and yet GNAT 2021 rejects them:

procedure Main is
   Πλάτων  : constant := 12;     -- Plato
   Чайковский : constant := 12;  -- Tchaikovsky
   θ, φ : constant := 12;        -- Angles
begin
   null;
end Main;

$ gprbuild
using project file foo.gpr
Compile
   [Ada]          main.adb
main.adb:2:04: error: declaration expected
main.adb:2:05: error: illegal character
main.adb:3:04: error: declaration expected
main.adb:3:05: error: illegal character
main.adb:4:05: error: illegal character
gprbuild: *** compilation phase failed

Where is the actual character set for Ada programs defined? Has GNAT 2021 got it wrong?

An example program using Unicode characters in identifiers is below for your experimentation. Note that the use of wide characters in the literal string is outside the scope of the question.

main.adb:

with Ada.Wide_Text_IO; use Ada.Wide_Text_IO;

procedure Main is
   δεδομένα_πράμα : constant Wide_String := "Ο Πλάτων θα ενέκρινε";
begin
   Put_Line (Δεδομένα_πράμα);
end Main;

foo.gpr

project foo is

   for Source_Dirs use (".");
   for Main use ("main.adb");

   package Compiler is
      for Default_Switches ("ada") use ("-gnatW8", "-gnatiw");
   end Compiler;

end foo;

To build & run:

gprbuild
./main

Where is the actual character set for Ada program text defined?

Answers (1)

Related Questions