Reputation: 126
I've just wanted to check, what chars SWI-prolog treats as 'alnum'. My question clause was:
findall(X,char_type(X,alnum),Lalnum),length(Lalnum,N).
and the SWI's answer:
Lalnum = ['0', '1', '2', '3', '4', '5', '6', '7', '8'|...],
N = 816459.
I was very surprised - why so many? Then I've decided to check pure 'ascii' set - after all, according to the doc page:
http://www.swi-prolog.org/pldoc/doc_for?object=char_type/2
there are only 128 of them (7 bit char set). My obvious question was:
findall(X,char_type(X,ascii),Lascii),length(Lascii,N).
and the SWI's answer:
Lascii = ['\000\', '\001\', '\002\', '\003\', '\004\',
'\005\', '\006\', '\a', '\b'|...],
N = 2176.
I was surprised even more than before... What is wrong? Where is the problem? With my question? With my SWI-prolog installation? With my system? It is:
SWI Prolog 7.7.13, with ascii encoding:
current_prolog_flag(encoding,X).
X = ascii.
Win 8.1 64bit, with code page 852.
And how to fix it?
Thank you in advance
EDIT: probably I've found the answer to my second question: 'how to fix it'. It seems, that additional clause:
sort(Lascii,SortedLascii)
removes repetitions and leaves the basic set of 128 chars alone.
but I still do not understand why the first clause generates so many results...???
Upvotes: 1
Views: 398
Reputation: 441
The reason for so many characters is Unicode. It'll return all relevant characters depending on your current locale.
Letters only:
?- :(C, char_type(C, alpha), L), length(L, Len).
L = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'|...],
Len = 2568.
Alphanumeric characters:
?- findall(C, char_type(C, alnum), L), length(L, Len).
L = ['0', '1', '2', '3', '4', '5', '6', '7', '8'|...],
Len = 2578.
Letters only:
?- findall(C, (char_type(C, alpha), char_type(C, ascii)), L), length(L, Len).
L = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'|...],
Len = 52.
Alphanumerics:
?- findall(C, (char_type(C, alnum), char_type(C, ascii)), L), length(L, Len).
L = ['0', '1', '2', '3', '4', '5', '6', '7', '8'|...],
Len = 62.
Because number of returned items is too high, the output is cut and omitted items are replaced with ellipsis. More details here: https://www.swi-prolog.org/FAQ/AllOutput.html
To change this behavior and see a complete output use the following config option:
set_prolog_flag(
answer_write_options,
[
quoted(true),
portray(true),
spacing(next_argument)
]
),
This way you'll see all Unicode characters and won't be confused any more.
Note that the only difference from default is absence of max_depth(10)
.
Upvotes: 1