Reputation: 56931
I want to write a test script in Python, where in.
re.LOCALE
flag to see the output.\S
and see the non-whitespace returned for that LOCALE.Now, how could I achieve that? Which LOCALE should I choose to see a clear difference in output from ASCII.
# -*- Proper encoding -*-
import re
pat = re.compile('\s*', re.LOCALE)
string = "string" # Proper Replacement String?
result = pat.match(string)
print result.group(0)
I am using Ubuntu and follow is the my current locale of my shell is.
$locale
LANG=en_SG.UTF-8
LANGUAGE=en_SG:en
LC_CTYPE="en_SG.UTF-8"
LC_NUMERIC="en_SG.UTF-8"
LC_TIME="en_SG.UTF-8"
LC_COLLATE="en_SG.UTF-8"
LC_MONETARY="en_SG.UTF-8"
LC_MESSAGES="en_SG.UTF-8"
LC_PAPER="en_SG.UTF-8"
LC_NAME="en_SG.UTF-8"
LC_ADDRESS="en_SG.UTF-8"
LC_TELEPHONE="en_SG.UTF-8"
LC_MEASUREMENT="en_SG.UTF-8"
LC_IDENTIFICATION="en_SG.UTF-8"
LC_ALL=
BTW, I have less experience with UNICODE or LOCALE aware inputs/outputs (If that matters). All I know is, I can type unicode letters using codepoints on the terminal.
Upvotes: 1
Views: 1037
Reputation: 56931
Answering my own question after digging around the source code.
In Python source code _sre.c
The definition of LOCALE Space is this -
#define SRE_LOC_IS_SPACE(ch) (!((ch) & ~255) ? isspace((ch)) : 0)
And the definition of NON_SPACE category is a negation of space. That's it.
Now, given that definition, we see for the character values higher than 255, the check is not made at all. Is it simple ascii isspace is considered when the LOCALE flag is set. And in effect, re.LOCALE flag has not extra effect on matching of space or non-white space character.
For Unicode, the logic is dealt with in unicodeobject.c
and I see it is just a super-set of ascii white space. All ascii whitespace characters are unicode whitespace characters too.
Given this, it impossible to write a program in Python. where you can test for 'exclusive white space character in locale or unicode' excluding the ascii whitespaces.
Upvotes: 1