audiFanatic
audiFanatic

Reputation: 2494

Match regular expression with quotes inside?

I have a file that needs to be read using regular expressions. The string can basically include anything (upper case, lower case, spaces, symbols, etc) so long as the line is no more than 60 characters. The method I tried works for most strings in the file, however, I need to be able to allow quotation marks as well, which is where I'm getting stuck. Here's what I've tried so far.

    else if (data.matches("[A-Za-z0-9 ,.?!%&()@$-_:;\\\"]+$")
            && !label.equals("") && prompt.equals("") && data.length() <= 60)
        {
            prompt = data;
        }

It reads everything else in fine, except the following string

    Yes, but an error is displayed, “Fuser out.”

Don't ask about the spelling, that was what was in the sample file I was given.

Thanks for any help, hopefully I'll get a response before I leave the laundromat, since I'm on Long Island and have no power or internet at home thanks to the hurricane.

Upvotes: 1

Views: 14565

Answers (4)

Yogendra Singh
Yogendra Singh

Reputation: 34367

Add \" in your regular expression e.g. below:

  data.matches("[A-Za-z0-9 ,.?!%&()@$-_:;\"\\]+$")

where \" is used, it will use " as a literal to match.

Upvotes: 2

Andrew Clark
Andrew Clark

Reputation: 208455

You are probably having issues matching that string because it uses smart quotes. The following article has some good information on this: Handy regexes for smart quotes

The summary is that you can add those characters to your regex using the following Unicode escapes:

\u201C\u201D\u201E\u201F\u2033\u2036

In addition, it looks like you currently intend to allow both backslashes and double quotes in your character class by using \" in your regex (\\\" in the Java string). This is not doing what you think it is, \" will just match a literal " character in you regex, it just has an unnecessary backslash. To actually include backslashes as valid characters, you need four consecutive backslashes in your java string.

You also need to escape the hyphen, otherwise $-_ is interpreted as a character range.

So your new regex would look something like this:

data.matches("[A-Za-z0-9 ,.?!%&()@$\\-_:;\\\\\"\\u201C\\u201D\\u201E\\u201F\\u2033\\u2036]+$")

Upvotes: 0

audiFanatic
audiFanatic

Reputation: 2494

Here's how I ended up simplifying it, if anyone is interested.

    if (data.matches("\n"))
        {
            // do nothing, ignore
        }
        else if (data.matches("[^ ]+$") && label.equals("")
            && data.length() <= 60)
        {
            label = data;
        }

        else if (data.matches(".+$")
            && !label.equals("") && prompt.equals("") && data.length() <= 60)
        {
            prompt = data;
        }

        else if (data.matches(".+$")
            && !label.equals("") && !prompt.equals("") && message.equals("")
            && data.length() <= 60)
        {
            message = data;
        }

        else if (data.matches("[^ ]+[ ]+[0-9]$") && label.equals("")
            && prompt.equals("") && message.equals("") && data.length() <= 60)
        {
            children = data;
            String[] info = children.split("[ ]+");
            parent = info[0];
            numChildren = Integer.parseInt(info[1]);

            tree.getNodeReference(parent).setNumChildren(numChildren);
        }

Upvotes: 0

Chuidiang
Chuidiang

Reputation: 1055

This is copy paste from your code

 "Yes, but an error is displayed, \"Fuser out.\"".matches("[A-Za-z0-9 ,.?!%&()@$-_:;\\\"]+$"));

and it returns true, so it is ok.

But I have a problem when I do the copy paste from your code. The character " in your String "Fuser out." is a different characer than your " in your regular expression ??

Upvotes: 0

Related Questions