Reputation: 417
I am currently attempting to create a program that matches words that are a specific length, or more, that do not contain a specific word.
Currently I have the Regex : \S{4,}(?!\w*apple\w*)
When used on the test : I love these delicious applestoo
There Regex will still match 'applestoo', which I do not want.
I can see that this is a logic error, but I do not understand how else to format this Regex. If you have a solution pelase do tell me, thank you in advance.
Edit:
This code now works for my example: (?!\w*apple\w*)\b\S{4,}\b
However, when using this new example it will still fail: 'logigng some testing data _______-----apple-###zx'
I have attempted to ammend this through using: (?!\w*(apple|_)\w*)\b\S{4,}\b
but this does not seem to be working.
Upvotes: 0
Views: 615
Reputation: 33
The regular expression to match a word with 4 characters only is "\b\w{4}\b". The "\b" is a word boundary that matches the position between a word character (as defined by the \w character class) and a non-word character. The "\w{4}" matches any four word characters, and the final "\b" is a word boundary again.
let word = "word";
let pattern = /\b\w{4}\b/;
if (pattern.test(word)) {
console.log("match");
} else {
console.log("no match");
}
Upvotes: 0
Reputation: 16236
You're looking for \b(?![^\W_]*apple)[^\W_]{4,}\b
(explained at regex101)
This uses [^\W_]
as the character matcher, which will match any character that is not a non-word character and not an underscore. This leaves the non-underscore word characters, making it similar to [[:alnum:]]
(assuming POSIX named character class support) or [0-9A-Za-z]
… if you just want letters, consider [[:alpha:]]
or, for just ASCII letters, [A-Za-z]
.
The negative lookahead, which follows the \b
word boundary marker for performance reasons, states that we can't have "apple" follow zero or more of these characters (regardless of what may follow it). We then ask to match four or more of these characters and then another word boundary marker.
In the following command-line demonstration, I've used grep -Po
to demonstrate this. -P
causes grep
to use its PCRE interpreter (from libpcre) and -o
causes it to show only matches, with each match on its own line:
$ echo 'logigng some testing data _______-----apple-###zx' \
|grep -Po '\b(?![^\W_]*apple)[^\W_]{4,}\b'
logigng
some
testing
data
$
Upvotes: 2