ktp
ktp

Reputation: 126

SWI prolog, char_type, ascii / alnum, why so many chars? how to fix it?

I've just wanted to check, what chars SWI-prolog treats as 'alnum'. My question clause was:

    findall(X,char_type(X,alnum),Lalnum),length(Lalnum,N).

and the SWI's answer:

    Lalnum = ['0', '1', '2', '3', '4', '5', '6', '7', '8'|...],
    N = 816459.

I was very surprised - why so many? Then I've decided to check pure 'ascii' set - after all, according to the doc page:

    http://www.swi-prolog.org/pldoc/doc_for?object=char_type/2

there are only 128 of them (7 bit char set). My obvious question was:

     findall(X,char_type(X,ascii),Lascii),length(Lascii,N).

and the SWI's answer:

    Lascii = ['\000\', '\001\', '\002\', '\003\', '\004\', 
    '\005\', '\006\', '\a', '\b'|...], 
    N = 2176.

I was surprised even more than before... What is wrong? Where is the problem? With my question? With my SWI-prolog installation? With my system? It is:

SWI Prolog 7.7.13, with ascii encoding:

    current_prolog_flag(encoding,X).
    X = ascii.

Win 8.1 64bit, with code page 852.

And how to fix it?

Thank you in advance

EDIT: probably I've found the answer to my second question: 'how to fix it'. It seems, that additional clause:

    sort(Lascii,SortedLascii)

removes repetitions and leaves the basic set of 128 chars alone.

but I still do not understand why the first clause generates so many results...???

Upvotes: 1

Views: 398

Answers (1)

Valera Grishin
Valera Grishin

Reputation: 441

The reason for so many characters is Unicode. It'll return all relevant characters depending on your current locale.

Including Unicode:

Letters only:

?- :(C, char_type(C, alpha), L), length(L, Len).                     
L = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'|...],
Len = 2568.

Alphanumeric characters:

?- findall(C, char_type(C, alnum), L), length(L, Len).
L = ['0', '1', '2', '3', '4', '5', '6', '7', '8'|...],
Len = 2578.

Now ASCII only:

Letters only:

?- findall(C, (char_type(C, alpha), char_type(C, ascii)), L), length(L, Len).
L = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'|...],
Len = 52.

Alphanumerics:

?- findall(C, (char_type(C, alnum), char_type(C, ascii)), L), length(L, Len).
L = ['0', '1', '2', '3', '4', '5', '6', '7', '8'|...],
Len = 62.

What's causing the confusion?

Because number of returned items is too high, the output is cut and omitted items are replaced with ellipsis. More details here: https://www.swi-prolog.org/FAQ/AllOutput.html

To change this behavior and see a complete output use the following config option:

set_prolog_flag(
    answer_write_options,
    [
        quoted(true),
        portray(true),
        spacing(next_argument)
    ]
),

This way you'll see all Unicode characters and won't be confused any more. Note that the only difference from default is absence of max_depth(10).

Upvotes: 1

Related Questions