Reputation: 121
I have the following regular expression in java -
Pattern p = Pattern.compile("int|float|char\\s\\w");
But still this is matching "intern
" too .
entire code -
package regex;
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class Regex {
public static void main(String[] args) throws IOException{
// TODO code application logic here
int c = 0;
BufferedReader bf = new BufferedReader(new FileReader("new.c"));
String line;
Pattern p = Pattern.compile("int|float|char\\s\\w");
Matcher m;
while((line = bf.readLine()) != null) {
m = p.matcher(line);
if(m.find()) {
c++;
}
}
System.out.println(c);
}
}
Upvotes: 0
Views: 131
Reputation: 490
Surround the options with parentheses like so:
Pattern p = Pattern.compile("(int|float|char)\\s\\w");
Also if you want to cover some edge cases in order to deal with some bad formatted code you can use:
Pattern p = Pattern.compile("^(\\s|\\t)*(int|float|char)(\\s|\\t)+[a-zA-Z_][a-zA-Z0-9_]*(\\s|\\t)*");
This should cover cases where there is more then one spaces or tabs between the type and the variable name and also cover variable names starting with underscore, and cases when "int" "float" or "char" are the end of some word.
Upvotes: 0
Reputation:
I assume you mean to find one of the alternatives, then followed by a space and a word.
But
(?:
int
| # or,
float
| # or,
char \s \w
)
you can see from the list that the \s\w
applies only to the char
alternative.
To fix that, bring the \s\w
outside of the group so it applies to all
the alternatives.
(?:
int
| # or,
float
| # or,
char
)
\s \w
The final regex is then "(?:int|float|char)\\s\\w"
Upvotes: 1