How do I mimic a Unicode JS regular expression in Lucee

Question

I am trying to write a regular express in Lucee to mimic the JS on the front end. Since Lucee's regex doesn't seem to suppoert unicode how do I do it.

This is the JS

function charTest(k){
    var regexp = /^[\u00C0-\u00ff\s -\~]+$/;
    return regexp.test(k)
}

if(!charTest(thisKey)){
    alert("Please Use Latin Characters Only");
    return false;
}

This is what I have tried in Lucee

regexp = '[\u00C0-\u00ff\s -\~]+/';
writeDump(reFind(regexp,"测));
writeDump(reFind(regexp,"test));

I have also tried

 regexp = "[\p{L}]";

but the dump is always 0

Shawn · Accepted Answer

EDIT: Give me one second. I think I interpreted your initial JS regex incorrectly. Fixing it.

EDIT 2: It was more than a second. Your original JS regex was: "/^[\u00C0-\u00ff\s -\~]+$/". This is:

Basic parts of regex:
"/..../" == signifies the start and stop of the Regex.
"^[...]" == signifies anything that is NOT in this group
"+" == signifies at least one of the previous
"$" == signifies the end of the string

Identifiers in the regex:
"\u00c0-\u00ff" == Unicode character range of Character 192 (À) 
                   to Character 255 (ÿ). This is the Latin 1 
                   Extension of the Unicode character set.
"\s" == signifies a Space Character
" -\~" == signifies another identifier for a space character to the 
          (escaped) tilde character (~). This is ASCII 32-126, which
          includes the printable characters of ASCII (except the DEL
          character (127). This includes alpha-numerics amd most punctuation.

I missed the second half of your printable Latin basic character set. I've updated my regex and tests to include it. There are ways to shorthand some of these identifiers, but I wanted it to be explicit.

You can try this:


//http://www.asciitable.com/
//https://en.wikipedia.org/wiki/List_of_Unicode_characters
//https://en.wikipedia.org/wiki/Latin_script_in_Unicode


function charTest(k) {
  return 
    REfind("[^" 
      & chr(32) & "-" & chr(126) 
      & chr(192) & "-" & chr(255) 
      & "]",arguments.k) 
    ? "Please Use Latin Characters Only" 
    : "" 
  ;
}


// TESTS
writeDump(charTest("测")); // Not Latin
writeDump(charTest("test")); // All characters between 31 & 126
writeDump(charTest("À")); // Character 192 (in range)
writeDump(charTest("À ")); // Character 192 and Space
writeDump(charTest("     ")); // Space Characters
writeDump(charTest("12345")); // Digits ( character 48-57 )
writeDump(charTest("ð")); // Character 240 (in range) 
writeDump(charTest("ℿ")); // Character 8511 (outside range)
writeDump(charTest(chr(199))); // CF Character (in range)
writeDump(charTest(chr(10))); // CF Line Feed Character (outside range)
writeDump(charTest(chr(1000))); // CF Character (outside range)

writeDump(charTest("
")); // CRLF (outside range)

writeDump(charTest(URLDecode("%00", "utf-8"))); // CF Null character (outside range)

//writeDump(asc("测"));
//writeDump(asc("test"));
//writeDump(asc("À"));
//writeDump(asc("ð"));
//writeDump(asc("ℿ"));

https://trycf.com/gist/05d27baaed2b8fc269f90c7c80a1aa82/lucee5?theme=monokai

All the regex does is look at your input string and if it doesn't find a value between chr(192) and chr(255), it will return your chosen string, else it will return nothing.

I think you can access the UNICODE characters below 255 directly. I'll have to test it.

Do you need to alert this function, like the Javascript? If you need to, you can just output a 1 or 0 to determine if this function actually found the character you're looking for.

How do I mimic a Unicode JS regular expression in Lucee

Answers (1)

Related Questions