Reputation: 10303
This is a follow-up to my previous question
I would like to find a minimal sequence of characters of length > N
, which starts at a word boundary and ends at the end of input.
For example:
N = 5, input = "aaa bbb cccc dd" result = "cccc dd"
I tried \b.{5,}?$
but it matches the whole input
rather the minimal part.
What regex
would you suggest?
Upvotes: 1
Views: 221
Reputation: 75242
The problem this time isn't greediness, it's eagerness. Regexes naturally try to find the earliest possible match, and getting them to find the last one can be tricky. The easiest way is usually the one @Arcadien demonstrated: use .*
to gobble up the whole string, then use backtracking to find the match on the rebound.
I have some questions about your requirements, though. \b
can match the beginning or the end of a word, so if (for example) N=5
and the string ends with "foo1 bar2"
, the result would be " bar2"
(notice the leading space). Do you really want a match that starts at the end of a word, or should it drop the space or back up to the beginning of "foo1"
? Also, will all words consist entirely of word characters? If there are any non-word characters, \b
will be able to match in even more surprising places.
For the regex below, I redefined "word" to mean any complete chunk of non-whitespace characters. The .*
starts out by consuming the whole string, then the lookahead - (?=.{5,})
- forces it to backtrack five positions before it tries to match anything. The \s
forces the match to start at the beginning of a word, so the rest of the regex captures one or more complete words.
/^.*(?=.{5,})\s(\S+(?:\s+\S+)*$)/
var N = 5;
var regex = "^.*(?=.{" + N + ",})\\s(\\S+(?:\\s+\\S+)*$)";
var match = regex.exec(subject);
var result = (match != null) ? match[1] : "";
This regex won't match anything that's less than five characters long or doesn't contain whitespace. If that's a problem, let me know and I'll tweak it.
Upvotes: 3
Reputation: 2159
Try using .{5}
(any character for the next 5 characters) instead of .{5,}
(any character for the next 5 or more characters)
The following worked for me using regexpal \w*.{5}$
(improved by @nhahtdh).
This will get all words that are followed by 5 characters.
Results:
String "AAAA BBBB CCCC DDEEE"
Match: "DDEEE"
String "AAAA BBBB CCCC DD"
Match: "CCCC DD"
String "AAAA BBBB CCCC"
Match: "BBBB CCCC"
String "AAAA"
Match: null
Upvotes: 1
Reputation: 56819
You can reverse the input by
.split("").reverse().join("")
And apply the answer from the previous question, then reverse the match with the same functions above.
This solution doesn't consider about performance.
Upvotes: 1