Reputation: 718
I tried searching for an answer to this question and also reading the Regex Wiki but I couldn't find what I'm looking for exactly.
I have a program that validates a document. (It was written by someone else).
If certain lines or characters don't match the regex then an error is generated. I've noted that a few false errors are always generated and I want to correct this. I believe I have narrowed down the problem to this:
Here is an example:
This error is flagged by the program logic:
ERROR: File header immediate origin name is invalid: CITIBANK, N.A.
Here is the code that causes that error:
if(strLine.substring(63,86).matches("[A-Z,a-z,0-9, ]+")){
}else{
JOptionPane.showMessageDialog(null, "ERROR: File header immediate origin name is invalid: "+strLine.substring(63,86));
errorFound=true;
fileHeaderErrorFound=true;
bw.write("ERROR: File header immediate origin name is invalid: "+strLine.substring(63,86));
bw.newLine();
I believe the reason that the error is called at runtime is because the text contains a period and comma.. I am unsure how to allow these in the regex.
I have tried using this
if(strLine.substring(63,86).matches("[A-Z,a-z,0-9,,,. ]+")){
and it seemed to work I just wanted to make sure that is the correct way because it doesn't look right.
Upvotes: 1
Views: 2449
Reputation: 129
Alphabets and digits : a-zA-Z0-9 can effectively be replaced by \w denoting 'words'. The period and comma don't need escaping and can be used as is. Hence this regex might come in handy:
"[\w,.]"
Hope this helps. :)
Upvotes: 0
Reputation: 336158
You're right in your analysis, the match failed because there was a dot in the text that isn't contained in the character class.
However, you can simplify the regex - no need to repeat the commas, they don't have any special meaning inside a class:
if(strLine.substring(63,86).matches("[A-Za-z0-9,. ]+"))
Are you sure that you'll never have to match non-ASCII letters or any other kind of punctuation, though?
Upvotes: 5