Reputation: 3213
I am trying to write a regular express in Lucee to mimic the JS on the front end. Since Lucee's regex doesn't seem to suppoert unicode how do I do it.
This is the JS
function charTest(k){
var regexp = /^[\u00C0-\u00ff\s -\~]+$/;
return regexp.test(k)
}
if(!charTest(thisKey)){
alert("Please Use Latin Characters Only");
return false;
}
This is what I have tried in Lucee
regexp = '[\u00C0-\u00ff\s -\~]+/';
writeDump(reFind(regexp,"测));
writeDump(reFind(regexp,"test));
I have also tried
regexp = "[\\p{L}]";
but the dump is always 0
Upvotes: 4
Views: 144
Reputation: 4786
EDIT: Give me one second. I think I interpreted your initial JS regex incorrectly. Fixing it.
EDIT 2: It was more than a second. Your original JS regex was:
"/^[\u00C0-\u00ff\s -\~]+$/"
. This is:
Basic parts of regex:
"/..../" == signifies the start and stop of the Regex.
"^[...]" == signifies anything that is NOT in this group
"+" == signifies at least one of the previous
"$" == signifies the end of the string
Identifiers in the regex:
"\u00c0-\u00ff" == Unicode character range of Character 192 (À)
to Character 255 (ÿ). This is the Latin 1
Extension of the Unicode character set.
"\s" == signifies a Space Character
" -\~" == signifies another identifier for a space character to the
(escaped) tilde character (~). This is ASCII 32-126, which
includes the printable characters of ASCII (except the DEL
character (127). This includes alpha-numerics amd most punctuation.
I missed the second half of your printable Latin basic character set. I've updated my regex and tests to include it. There are ways to shorthand some of these identifiers, but I wanted it to be explicit.
You can try this:
<cfscript>
//http://www.asciitable.com/
//https://en.wikipedia.org/wiki/List_of_Unicode_characters
//https://en.wikipedia.org/wiki/Latin_script_in_Unicode
function charTest(k) {
return
REfind("[^"
& chr(32) & "-" & chr(126)
& chr(192) & "-" & chr(255)
& "]",arguments.k)
? "Please Use Latin Characters Only"
: ""
;
}
// TESTS
writeDump(charTest("测")); // Not Latin
writeDump(charTest("test")); // All characters between 31 & 126
writeDump(charTest("À")); // Character 192 (in range)
writeDump(charTest("À ")); // Character 192 and Space
writeDump(charTest(" ")); // Space Characters
writeDump(charTest("12345")); // Digits ( character 48-57 )
writeDump(charTest("ð")); // Character 240 (in range)
writeDump(charTest("ℿ")); // Character 8511 (outside range)
writeDump(charTest(chr(199))); // CF Character (in range)
writeDump(charTest(chr(10))); // CF Line Feed Character (outside range)
writeDump(charTest(chr(1000))); // CF Character (outside range)
writeDump(charTest("
")); // CRLF (outside range)
writeDump(charTest(URLDecode("%00", "utf-8"))); // CF Null character (outside range)
//writeDump(asc("测"));
//writeDump(asc("test"));
//writeDump(asc("À"));
//writeDump(asc("ð"));
//writeDump(asc("ℿ"));
</cfscript>
https://trycf.com/gist/05d27baaed2b8fc269f90c7c80a1aa82/lucee5?theme=monokai
All the regex does is look at your input string and if it doesn't find a value between chr(192)
and chr(255)
, it will return your chosen string, else it will return nothing.
I think you can access the UNICODE characters below 255 directly. I'll have to test it.
Do you need to alert this function, like the Javascript? If you need to, you can just output a 1 or 0 to determine if this function actually found the character you're looking for.
Upvotes: 2