Reputation: 2686
I have this regexp
/[A-Za-zÀ-ÿ]+/g
that matches 'words' composed by characters of unlimited lenght.
If I do want to exclude words starting with a capital letter?
I tried
/(^[A-Z])[A-Za-zÀ-ÿ]+/g
but it doesn't seems to work. Can't use things like /w for it doesn't include diacritics.
EDIT: the language in use is Typescript so the javascript engine (which doesn't allow lookbehind, for example) Sorry for not mention this.
EDIT: the input given can be something like
"foo" //should match foo and return true
"Foo" //should not match foo and return false
"fòo" //should match fòo and return true
" " //should not match foo and return false
"." //should not match foo and return false
"," //should not match foo and return false
Code (Typescript) matching without the capital letter thing
isProperWord(word){
/* rejects
- string that are not words (symbols, spaces, etc...)
- names (words starting with a capital letter)
*/
if(word.match(/[A-Za-zÀ-ÿ]+/g)){
return true;
}else{
return false;
}
}
Upvotes: 0
Views: 179
Reputation: 627419
To match all capital letters from your initial range, you may use [A-ZÀ-ÖØ-Þ]
character class. To match all lowercase letters, [a-zß-öø-ÿ]
. Note that ×
and ÷
are not letters, I removed them from these classes.
To make sure the whole string consists of these letters only, and the first char is not an uppercase letter, use
/^[a-zß-öø-ÿ][A-Za-zÀ-ÖØ-öø-ÿ]*$/
See the regex demo.
JS demo:
var strs = ['foo','fòo','Foo',' ','.',','];
var rx = /^[a-zß-öø-ÿ][A-Za-zÀ-ÖØ-öø-ÿ]*$/;
for (var s of strs) {
console.log(s,"=>",rx.test(s));
}
To extract words, use custom boundaries:
var s = 'foo,fòo,Foo';
var rx = /(?:[^A-Za-zÀ-ÖØ-öø-ÿ]|^)([a-zß-öø-ÿ][A-Za-zÀ-ÖØ-öø-ÿ]*)(?![A-Za-zÀ-ÖØ-öø-ÿ])/g;
var m, res=[];
while(m=rx.exec(s)) {
res.push(m[1]);
}
console.log(res);
Upvotes: 1
Reputation: 189830
The expression ^[A-Z]
means match an uppercase character at the beginning of line. You probably tried to type [^A-Z]
which matches a character which is not an uppercase alphabetic between A and Z, but that still doesn't help, because the regex engine will find a character somewhere which matches this, and be satisified. (For example, a space trivially matches this -- it's a character, and it's not in the range A through Z.)
If you use a regex dialect which understands word boundaries with \b
, try
/\b[a-z][A-Za-z]*/
to match a token which has a word boundary on its left, and a lowercase character adjacent to it. (I am ignoring your locale extension, which is not portable and possibly not well-defined.)
In isolation, the /g
flag doesn't do anything. If you have a language which supports it, and use a regex in a while
loop or similar, it will cause the engine to return all the matches in the string, one at a time, inside the loop; but without further context, we have no idea whether that is actually true here.
Upvotes: 3