Reputation: 2084
Trying to make a regex that grabs all words like lets just say, chicken, that are not in brackets. So like
chicken
Would be selected but
[chicken]
Would not. Does anyone know how to do this?
Upvotes: 4
Views: 178
Reputation: 14699
String template = "[chicken]";
String pattern = "\\G(?<!\\[)(\\w+)(?!\\])";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(template);
while (m.find())
{
System.out.println(m.group());
}
It uses a combination of negative look-behind and negative look-aheads and boundary matchers.
(?<!\\[) //negative look behind
(?!\\]) //negative look ahead
(\\w+) //capture group for the word
\\G //is a boundary matcher for marking the end of the previous match
(please read the following edits for clarification)
EDIT 1:
If one needs to account for situations like:
"chicken [chicken] chicken [chicken]"
We can replace the regex with:
String regex = "(?<!\\[)\\b(\\w+)\\b(?!\\])";
EDIT 2:
If one also needs to account for situations like:
"[chicken"
"chicken]"
As in one still wants the "chicken"
, then you could use:
String pattern = "(?<!\\[)?\\b(\\w+)\\b(?!\\])|(?<!\\[)\\b(\\w+)\\b(?!\\])?";
Which essentially accounts for the two cases of having only one bracket on either side. It accomplishes this through the |
which acts as an or, and by using ?
after the look-ahead/behinds, where ?
means 0 or 1 of the previous expression.
Upvotes: 7
Reputation: 29874
Without look around:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MatchingTest
{
private static String x = "pig [cow] chicken bull] [grain";
public static void main(String[] args)
{
Pattern p = Pattern.compile("(\\[?)(\\w+)(\\]?)");
Matcher m = p.matcher(x);
while(m.find())
{
String firstBracket = m.group(1);
String word = m.group(2);
String lastBracket = m.group(3);
if ("".equals(firstBracket) && "".equals(lastBracket))
{
System.out.println(word);
}
}
}
}
Output:
pig
chicken
A bit more verbose, sure, but I find it more readable and easier to understand. Certainly simpler than a huge regular expression trying to handle all possible combinations of brackets.
Note that this won't filter out input like [fence tree grass]
; it will indicate that tree
is a match. You cannot skip tree
in that without a parser. Hopefully, this is not a case you need to handle.
Upvotes: 0
Reputation: 22457
Use this:
/(?<![\[\w])\w+(?![\w\]])/
i.e., consecutive word characters with no square bracket or word character before or after.
This needs to check both left and right for both a square bracket and a word character, else for your input of [chicken]
it would simply return
hicke
Upvotes: 0
Reputation: 183311
I guess you want something like:
final Pattern UNBRACKETED_WORD_PAT = Pattern.compile("(?<!\\[)\\b\\w+\\b(?!])");
private List<String> findAllUnbracketedWords(final String s) {
final List<String> ret = new ArrayList<String>();
final Matcher m = UNBRACKETED_WORD_PAT.matcher(s);
while (m.find()) {
ret.add(m.group());
}
return Collections.unmodifiableList(ret);
}
Upvotes: 2