Testing whitespace using Regex with LOCALE and UNICODE flags in Python

Question

I want to write a test script in Python, where in.

I give a string in locale other than ASCII which has different set of whitespace characters and then use '\s' with re.LOCALE flag to see the output.
I would like to do the complement of it too. I want to \S and see the non-whitespace returned for that LOCALE.

Now, how could I achieve that? Which LOCALE should I choose to see a clear difference in output from ASCII.

# -*- Proper encoding -*-
import re
pat = re.compile('\s*', re.LOCALE)
string = "string"  # Proper Replacement String?
result = pat.match(string)
print result.group(0)

I am using Ubuntu and follow is the my current locale of my shell is.

$locale
LANG=en_SG.UTF-8
LANGUAGE=en_SG:en
LC_CTYPE="en_SG.UTF-8"
LC_NUMERIC="en_SG.UTF-8"
LC_TIME="en_SG.UTF-8"
LC_COLLATE="en_SG.UTF-8"
LC_MONETARY="en_SG.UTF-8"
LC_MESSAGES="en_SG.UTF-8"
LC_PAPER="en_SG.UTF-8"
LC_NAME="en_SG.UTF-8"
LC_ADDRESS="en_SG.UTF-8"
LC_TELEPHONE="en_SG.UTF-8"
LC_MEASUREMENT="en_SG.UTF-8"
LC_IDENTIFICATION="en_SG.UTF-8"
LC_ALL=

BTW, I have less experience with UNICODE or LOCALE aware inputs/outputs (If that matters). All I know is, I can type unicode letters using codepoints on the terminal.

Testing whitespace using Regex with LOCALE and UNICODE flags in Python

Answers (1)

Related Questions