Reputation: 2040
I need to establish a java regex that will recognize the following 3 cases:
or
or
I will list what I have tried so far and the errors that have arisen.
public static final VALID_STRING = "[ACTGactg:]*";
// Matches the first case but not the second or third
// as expected.
public static final VALID_STRING = "\\?|[ACTGactg:]*";
// Matches all 3 conditions when my understanding leads me to
// believe that it should not except the third case of "NTC"
public static final VALID_STRING = "?|[ACTGactg:]*";
// Yields PatternSyntaxException dangling metacharacter ?
What I would expect to be accurate is the following:
public static final VALID_STRING = "NTC|\\?|[ACTGacgt:]*";
But I want to make sure that if I take away the "NTC" that any "NTC" string will appear as invalid.
Here is the method I am using to test these regexs.
private static boolean isValid(String thisString){
boolean valid = false;
Pattern checkRegex = Pattern.compile(VALID_STRING);
Matcher matchRegex = checkRegex.matcher(thisString);
while (matchRegex.find()){
if (matchRegex.group().length != 0){
valid = true;
}
}
return valid;
}
So here are my closing questions:
Could the "\\?" regex possible be acting as a wild card character that is accepting the "NTC" string?
Are the or operators "|" appropriate here?
Do I need to make use of parenthesis when using these or operators?
Here are some example incoming strings:
Thank you
Upvotes: 0
Views: 1065
Reputation: 401
Yes the provided regex would be ok:
public static final VALID_STRING = "NTC|\\?|[ACTGacgt:]+";
...
boolean valid = str.matches(VALID_STRING);
If your remove NTC|
from the regex the string NTC becomes invalid.
You can test it and experiment yourself here.
Upvotes: 2
Reputation: 34608
Since you are using the Matcher.find()
method, you are looking for your pattern anywhere in the string.
This means the strings A:C
, T:G
, AA:CC
etc. match in their entirety. But how about NTC
?
It matches because find()
looks for a match anywhere. the TC
part of it matches, therefore you get true
.
If you want to match only the strings in their entirety, either use the match()
method, or use ^
and $
.
Note that you don't have to check that the match is longer than 0, if you change your pattern to [ACTGactg:]+
instead of [ACTGactg:]*
.
Upvotes: 2