Reputation: 2546
Question:
How do I capture all words in a string
except the last word unless it's followed by ,
or .
?
i.e. for jumps over the lazy dog
capture jumps
over
the
lazy
. But for jumps over the lazy dog.
the dog
also has to be captured.
NOTE: this is JavaScript regex.
What I have done:
\([\w']+\b)\g
captures all the words, however it also captures the dog
without the punctuation. (I used [\w']
to include apostrophes in words).
I suspect the answer has something to do with $
?
Upvotes: 1
Views: 1517
Reputation:
This appears to work \w+\b(?=\W+\w|\s*[,.])
but there are other ways I'm sure.
Formatted
\w+ \b # word in string
(?= # Check
\W+ \w # Not the last word
| \s* [,.] # or, a word followed by a dot or comma
)
if (matches = "asdf abcd +=&^$#@.+)(*&".match(/\w+\b(?=\W+\w|\s*[,.])/g))
console.log( matches );
Upvotes: 0
Reputation: 3405
Regex: (?![^,. ]+$)\w+
Add char to be allowed at the end of the last word [^,. ]
Details:
(?!)
Negative Lookahead[^]
Match a single character not present in the list\w
matches any word character (equal to [a-zA-Z0-9_]
)+
Matches between one and unlimited times$
Asserts position at the end of a linefunction myFunction() {
console.clear();
var re = /(?![^,. ]+$)\w+/g;
var s = document.getElementById("input").value;
var m;
do {
m = re.exec(s);
if (m) {
console.log(m[0]);
}
} while (m);
}
<form action="javascript:myFunction()">
<input id="input" type="text" name="lastname" value="jumps over the lazy dog."><br><br>
<input type="submit" value="Submit">
</form>
Upvotes: 1
Reputation: 22817
var r = /\w+\b(?!$)/gm
var a = [
"jumps over the lazy dog",
"jumps over the lazy dog."
]
a.forEach(function(s) {
var x = []
while(m = r.exec(s)) {
x.push(m[0])
}
console.log(x)
})
\w+\b(?!$)
\w+
Matches one or more word characters\b
Assert position as a word boundary(?!$)
Negative lookahead ensuring what follows is not the end of the lineIf you need to ensure the last word is followed by only .
or ,
, you can use \w+\b(?![^.,]?$)
instead. This will ensure that words at the end of the line that are not followed by .
or ,
are excluded. Expand and run the following snippet to see this alternative method in practice.
var r = /\w+\b(?![^.,]?$)/gm
var a = [
"jumps over the lazy dog",
"jumps over the lazy dog.",
"jumps over the lazy dog;"
]
a.forEach(function(s) {
var x = []
while(m = r.exec(s)) {
x.push(m[0])
}
console.log(x)
})
Upvotes: 3