ARPIT PRASHANT BAHETY
ARPIT PRASHANT BAHETY

Reputation: 121

Regular expression not considering space

I have the following regular expression in java -

Pattern p = Pattern.compile("int|float|char\\s\\w");

But still this is matching "intern" too .

entire code -

package regex;

import java.io.*;
import java.util.*;
import java.util.regex.*;

public class Regex {

    public static void main(String[] args) throws IOException{
        // TODO code application logic here
        int c = 0;
        BufferedReader bf = new BufferedReader(new FileReader("new.c"));
        String line;
        Pattern p = Pattern.compile("int|float|char\\s\\w");
        Matcher m;
        while((line = bf.readLine()) != null) {
            m = p.matcher(line);
            if(m.find()) {
                c++;
            }
        }
        System.out.println(c);
    }
}

Upvotes: 0

Views: 131

Answers (2)

Ori Shalom
Ori Shalom

Reputation: 490

Surround the options with parentheses like so:

Pattern p = Pattern.compile("(int|float|char)\\s\\w");

Also if you want to cover some edge cases in order to deal with some bad formatted code you can use:

Pattern p = Pattern.compile("^(\\s|\\t)*(int|float|char)(\\s|\\t)+[a-zA-Z_][a-zA-Z0-9_]*(\\s|\\t)*");

This should cover cases where there is more then one spaces or tabs between the type and the variable name and also cover variable names starting with underscore, and cases when "int" "float" or "char" are the end of some word.

Upvotes: 0

user557597
user557597

Reputation:

I assume you mean to find one of the alternatives, then followed by a space and a word.

But

 (?:
      int
   |                    # or,
      float
   |                    # or,
      char \s \w
 )

you can see from the list that the \s\w applies only to the char alternative.

To fix that, bring the \s\w outside of the group so it applies to all
the alternatives.

 (?:
      int
   |                    # or,
      float
   |                    # or,
      char 
 )
 \s \w

The final regex is then "(?:int|float|char)\\s\\w"

Upvotes: 1

Related Questions