SegFault
SegFault

Reputation: 2546

Make regex not match the last word

Question:

How do I capture all words in a string except the last word unless it's followed by , or .?

i.e. for jumps over the lazy dog capture jumps over the lazy. But for jumps over the lazy dog. the dog also has to be captured.

NOTE: this is JavaScript regex.


What I have done:

\([\w']+\b)\g captures all the words, however it also captures the dog without the punctuation. (I used [\w'] to include apostrophes in words).

I suspect the answer has something to do with $?

Upvotes: 1

Views: 1517

Answers (3)

user557597
user557597

Reputation:

This appears to work \w+\b(?=\W+\w|\s*[,.])
but there are other ways I'm sure.

Formatted

 \w+ \b                 # word in string
 (?=                    # Check 
      \W+ \w                 # Not the last word
   |  \s* [,.]               # or, a word followed by a dot or comma
 )

if (matches = "asdf abcd  +=&^$#@.+)(*&".match(/\w+\b(?=\W+\w|\s*[,.])/g))
   console.log( matches );

Upvotes: 0

Srdjan M.
Srdjan M.

Reputation: 3405

Regex: (?![^,. ]+$)\w+

Add char to be allowed at the end of the last word [^,. ]

Details:

  • (?!) Negative Lookahead
  • [^] Match a single character not present in the list
  • \w matches any word character (equal to [a-zA-Z0-9_])
  • + Matches between one and unlimited times
  • $ Asserts position at the end of a line

function myFunction() {
console.clear();
  var re = /(?![^,. ]+$)\w+/g;
  var s = document.getElementById("input").value;
  var m;

  do {
      m = re.exec(s);
      if (m) {
          console.log(m[0]);
      }
  } while (m);
}
<form action="javascript:myFunction()">
  <input id="input" type="text" name="lastname" value="jumps over the lazy dog."><br><br>
  <input type="submit" value="Submit">
</form>

Upvotes: 1

ctwheels
ctwheels

Reputation: 22817

var r = /\w+\b(?!$)/gm
var a = [
  "jumps over the lazy dog",
  "jumps over the lazy dog."
]

a.forEach(function(s) {
  var x = []
  while(m = r.exec(s)) {
    x.push(m[0])
  }
  console.log(x)
})

\w+\b(?!$)
  • \w+ Matches one or more word characters
  • \b Assert position as a word boundary
  • (?!$) Negative lookahead ensuring what follows is not the end of the line

If you need to ensure the last word is followed by only . or ,, you can use \w+\b(?![^.,]?$) instead. This will ensure that words at the end of the line that are not followed by . or , are excluded. Expand and run the following snippet to see this alternative method in practice.

var r = /\w+\b(?![^.,]?$)/gm
var a = [
  "jumps over the lazy dog",
  "jumps over the lazy dog.",
  "jumps over the lazy dog;"
]

a.forEach(function(s) {
  var x = []
  while(m = r.exec(s)) {
    x.push(m[0])
  }
  console.log(x)
})

Upvotes: 3

Related Questions