jesric1029
jesric1029

Reputation: 718

Regex for commas and periods allowed

I tried searching for an answer to this question and also reading the Regex Wiki but I couldn't find what I'm looking for exactly.

I have a program that validates a document. (It was written by someone else).

If certain lines or characters don't match the regex then an error is generated. I've noted that a few false errors are always generated and I want to correct this. I believe I have narrowed down the problem to this:

Here is an example:

This error is flagged by the program logic:

ERROR: File header immediate origin name is invalid: CITIBANK, N.A. 

Here is the code that causes that error:

if(strLine.substring(63,86).matches("[A-Z,a-z,0-9, ]+")){

                                }else{
                                    JOptionPane.showMessageDialog(null, "ERROR: File header immediate origin name is invalid: "+strLine.substring(63,86));
                                    errorFound=true;
                                    fileHeaderErrorFound=true;
                                    bw.write("ERROR: File header immediate origin name is invalid: "+strLine.substring(63,86));
                                    bw.newLine();

I believe the reason that the error is called at runtime is because the text contains a period and comma.. I am unsure how to allow these in the regex.

I have tried using this

if(strLine.substring(63,86).matches("[A-Z,a-z,0-9,,,. ]+")){

and it seemed to work I just wanted to make sure that is the correct way because it doesn't look right.

Upvotes: 1

Views: 2449

Answers (2)

oliver_48
oliver_48

Reputation: 129

Alphabets and digits : a-zA-Z0-9 can effectively be replaced by \w denoting 'words'. The period and comma don't need escaping and can be used as is. Hence this regex might come in handy:

"[\w,.]"

Hope this helps. :)

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336158

You're right in your analysis, the match failed because there was a dot in the text that isn't contained in the character class.

However, you can simplify the regex - no need to repeat the commas, they don't have any special meaning inside a class:

if(strLine.substring(63,86).matches("[A-Za-z0-9,. ]+"))

Are you sure that you'll never have to match non-ASCII letters or any other kind of punctuation, though?

Upvotes: 5

Related Questions