Reputation: 13507
I'm trying to write regex for the following situations:
badword%
%badword
%badword%
The %
signs differ, depending on where they are. A %
at the front needs a lookbehind to match letters preceding the word badword
until it reaches a non-letter. Likewise, any %
that is not at the front needs a lookahead to match letters following the word badword
until it hits a non-letter.
Here's what I'm trying to achieve. If I have the following:
Just a regular superbadwording sentece.
badword # should match "badword", easy enough
badword% # should match "badwording"
%badword% # should match "superbadwording"
At the same time. If I have a similar sentence:
Here's another verybadword example.
badword # should match "badword", easy enough
badword% # should also match "badword"
%badword% # should match "verybadword"
I don't want to use spaces as the assertion capture groups. Assume that I want to capture \w
.
Here's what I have so far, in Java:
String badword = "%badword%";
String _badword = badword.replace("%", "");
badword = badword.replaceAll("^(?!%)%", "(?=\w)"); // match a % NOT at the beginning of a string, replace with look ahead that captures \w, not working
badword = badword.replaceAll("^%", "(?!=\w)"); // match a % at the beginning of a string, replace it with a look behind that captures \w, not working
System.out.println(badword); // ????
So, how can I accomplish this?
PS: Please don't assume the %
's are forced to the start and end of a match. If a %
is the first character, then it will need a look behind, any and all other %
's are look aheads.
Upvotes: 3
Views: 5947
Reputation: 14699
badword = badword.replaceAll("^%", "(?!=\w)");
// match a % at the beginning of a string, replace it with a look behind
//that captures \w, not working
(?!=\w)
is a negative-look ahead for =\w
, but it seems like you want a positive look-behind. Secondly, lookaheads and lookbehinds are atomic, and thus inherently not capturing, so if I'm correct in my interpretation, you want:
"(?<=(\\w+))"
. You need the additional ()
for capturing.
For your first part, it would be: "(?=(\\w+))
, and the first argument should be "(?<!^)%"
.
PS: You need two backslashes for \\w
, and you seem to want to match multiple characters, no? If so, you would need \\w+
. Also, if you don't want to do this for every occurrence, then I suggest using String.format()
instead of replaceAll()
.
Upvotes: 3
Reputation: 153
From your question it doesn't seem necessary to use lookaround, so you could just replace all %
with \w*
Snippet:
String tested = "Just a regular superbadwording sentece.";
String bad = "%badword%";
bad = bad.replaceAll("%", "\\\\w*");
Pattern p = Pattern.compile(bad);
Matcher m = p.matcher(tested);
while(m.find()) {
String found = m.group();
System.out.println(found);
}
\w doesn't match #,-,etc. so I think \S is better here
Upvotes: 2