Matching words as separate strings unless they start with a capital letter

Question

I have this regexp

/[A-Za-zÀ-ÿ]+/g

that matches 'words' composed by characters of unlimited lenght.

If I do want to exclude words starting with a capital letter?

I tried

/(^[A-Z])[A-Za-zÀ-ÿ]+/g

but it doesn't seems to work. Can't use things like /w for it doesn't include diacritics.

EDIT: the language in use is Typescript so the javascript engine (which doesn't allow lookbehind, for example) Sorry for not mention this.

EDIT: the input given can be something like

"foo"            //should match foo and return true
"Foo"            //should not match foo and return false
"fòo"            //should match fòo and return true
" "              //should not match foo and return false
"."              //should not match foo and return false
","              //should not match foo and return false

Code (Typescript) matching without the capital letter thing

isProperWord(word){
    /* rejects
      - string that are not words (symbols, spaces, etc...)
      - names (words starting with a capital letter)
    */
    if(word.match(/[A-Za-zÀ-ÿ]+/g)){
      return true;
    }else{
      return false;
    }

}

Wiktor Stribiżew · Accepted Answer

To match all capital letters from your initial range, you may use [A-ZÀ-ÖØ-Þ] character class. To match all lowercase letters, [a-zß-öø-ÿ]. Note that × and ÷ are not letters, I removed them from these classes.

To make sure the whole string consists of these letters only, and the first char is not an uppercase letter, use

/^[a-zß-öø-ÿ][A-Za-zÀ-ÖØ-öø-ÿ]*$/

See the regex demo.

JS demo:

var strs = ['foo','fòo','Foo',' ','.',','];
var rx = /^[a-zß-öø-ÿ][A-Za-zÀ-ÖØ-öø-ÿ]*$/;
for (var s of strs) {
  console.log(s,"=>",rx.test(s));
}

To extract words, use custom boundaries:

var s = 'foo,fòo,Foo';
var rx = /(?:[^A-Za-zÀ-ÖØ-öø-ÿ]|^)([a-zß-öø-ÿ][A-Za-zÀ-ÖØ-öø-ÿ]*)(?![A-Za-zÀ-ÖØ-öø-ÿ])/g;
var m, res=[];
while(m=rx.exec(s)) {
  res.push(m[1]);
}
console.log(res);

Matching words as separate strings unless they start with a capital letter

Answers (2)

Related Questions