Reputation: 195
I am struggling to understand word boundary \b in regex. I read that there are three conditions for \b.
I am trying to find the start index of the previous match using the java method start()
import java.util.regex.*;
class Quetico{
public static void main(String[] args){
Pattern p = Pattern.compile(args[0]);
Matcher m = p.matcher(args[[1]]);
System.out.print("match positions: ");
while(m.find()){
System.out.print(m.start()+" ");
}
System.out.println();
}
}
% java Quetico "\b" "^23 *$76 bc"
//string: ^23 *$76 bc pattern:\b
//index : 01234567890
produces: 1 3 5 6 7 9
I'm having trouble understanding why is produces this result. Because I'm struggling to see the pattern. Ive tried looking at the inverse, \B which produces 0 2 4 8 however this doesn't make it any clearer for me. If you can help clarify this for me it would be appreciated.
Upvotes: 2
Views: 380
Reputation: 31699
The issue isn't Java here, it's Linux/Unix. When you put text between double quote marks on the command line, most of the special shell characters such as *
, ?
, etc. are no longer special--except for variable interpolation. (And some other things, like !
depending on which shell flavor you're using.) Thus, if you say
% command "this $variable is interesting"
if you've set variable
to value
, your command will be called with one argument, this value is interesting
. In your case, Linux will treat $7
as a shell script parameter, even though you're not in a shell script; since this isn't set to anything, it's replaced with an empty string, and the result is the same as if you had run
% java Quetico "\b" "^23 *6 bc"
which gives me 1 3 5 6 7 9
if I use that string literal in a Java program (instead of on the command line).
To prevent $
from being interpreted by the shell, you need to use single quote marks:
% java Quetico "\b" '^23 *$76 bc'
Upvotes: 3