Reputation: 2494
I have a file that needs to be read using regular expressions. The string can basically include anything (upper case, lower case, spaces, symbols, etc) so long as the line is no more than 60 characters. The method I tried works for most strings in the file, however, I need to be able to allow quotation marks as well, which is where I'm getting stuck. Here's what I've tried so far.
else if (data.matches("[A-Za-z0-9 ,.?!%&()@$-_:;\\\"]+$")
&& !label.equals("") && prompt.equals("") && data.length() <= 60)
{
prompt = data;
}
It reads everything else in fine, except the following string
Yes, but an error is displayed, “Fuser out.”
Don't ask about the spelling, that was what was in the sample file I was given.
Thanks for any help, hopefully I'll get a response before I leave the laundromat, since I'm on Long Island and have no power or internet at home thanks to the hurricane.
Upvotes: 1
Views: 14565
Reputation: 34367
Add \"
in your regular expression e.g. below:
data.matches("[A-Za-z0-9 ,.?!%&()@$-_:;\"\\]+$")
where \"
is used, it will use "
as a literal to match.
Upvotes: 2
Reputation: 208455
You are probably having issues matching that string because it uses smart quotes. The following article has some good information on this: Handy regexes for smart quotes
The summary is that you can add those characters to your regex using the following Unicode escapes:
\u201C\u201D\u201E\u201F\u2033\u2036
In addition, it looks like you currently intend to allow both backslashes and double quotes in your character class by using \"
in your regex (\\\"
in the Java string). This is not doing what you think it is, \"
will just match a literal "
character in you regex, it just has an unnecessary backslash. To actually include backslashes as valid characters, you need four consecutive backslashes in your java string.
You also need to escape the hyphen, otherwise $-_
is interpreted as a character range.
So your new regex would look something like this:
data.matches("[A-Za-z0-9 ,.?!%&()@$\\-_:;\\\\\"\\u201C\\u201D\\u201E\\u201F\\u2033\\u2036]+$")
Upvotes: 0
Reputation: 2494
Here's how I ended up simplifying it, if anyone is interested.
if (data.matches("\n"))
{
// do nothing, ignore
}
else if (data.matches("[^ ]+$") && label.equals("")
&& data.length() <= 60)
{
label = data;
}
else if (data.matches(".+$")
&& !label.equals("") && prompt.equals("") && data.length() <= 60)
{
prompt = data;
}
else if (data.matches(".+$")
&& !label.equals("") && !prompt.equals("") && message.equals("")
&& data.length() <= 60)
{
message = data;
}
else if (data.matches("[^ ]+[ ]+[0-9]$") && label.equals("")
&& prompt.equals("") && message.equals("") && data.length() <= 60)
{
children = data;
String[] info = children.split("[ ]+");
parent = info[0];
numChildren = Integer.parseInt(info[1]);
tree.getNodeReference(parent).setNumChildren(numChildren);
}
Upvotes: 0
Reputation: 1055
This is copy paste from your code
"Yes, but an error is displayed, \"Fuser out.\"".matches("[A-Za-z0-9 ,.?!%&()@$-_:;\\\"]+$"));
and it returns true, so it is ok.
But I have a problem when I do the copy paste from your code. The character " in your String "Fuser out." is a different characer than your " in your regular expression ??
Upvotes: 0