JensD
JensD

Reputation: 195

Regex understand \b

I am struggling to understand word boundary \b in regex. I read that there are three conditions for \b.

I am trying to find the start index of the previous match using the java method start()

import java.util.regex.*;
class Quetico{
    public static void main(String[] args){
        Pattern p = Pattern.compile(args[0]);
        Matcher m = p.matcher(args[[1]]);
        System.out.print("match positions: ");
        while(m.find()){
            System.out.print(m.start()+" ");
        }
        System.out.println();
    }
}

% java Quetico "\b" "^23 *$76 bc"

//string: ^23 *$76 bc     pattern:\b
//index : 01234567890

produces: 1 3 5 6 7 9

I'm having trouble understanding why is produces this result. Because I'm struggling to see the pattern. Ive tried looking at the inverse, \B which produces 0 2 4 8 however this doesn't make it any clearer for me. If you can help clarify this for me it would be appreciated.

Upvotes: 2

Views: 380

Answers (1)

ajb
ajb

Reputation: 31699

The issue isn't Java here, it's Linux/Unix. When you put text between double quote marks on the command line, most of the special shell characters such as *, ?, etc. are no longer special--except for variable interpolation. (And some other things, like ! depending on which shell flavor you're using.) Thus, if you say

% command "this $variable is interesting"

if you've set variable to value, your command will be called with one argument, this value is interesting. In your case, Linux will treat $7 as a shell script parameter, even though you're not in a shell script; since this isn't set to anything, it's replaced with an empty string, and the result is the same as if you had run

% java Quetico "\b" "^23 *6 bc"

which gives me 1 3 5 6 7 9 if I use that string literal in a Java program (instead of on the command line).

To prevent $ from being interpreted by the shell, you need to use single quote marks:

% java Quetico "\b" '^23 *$76 bc'

Upvotes: 3

Related Questions