Yuki1112
Yuki1112

Reputation: 365

Java Regex: find single quoted text with escapes

This method should return all occurrences of single-quoted strings. However, an escaped single quote (\’) should be treated as a regular single-quote character (just like an escaped double-quote in Java). Example: “This ’isn\’t’ easy’” the method should return the single string “isn’t”.
My code:

public static List<String> findSingleQuotedTextWithEscapes(String input) {
        Pattern pattern = Pattern.compile ("(?:\\w|'[^']*')+");
        Matcher matcher = pattern.matcher (input);
        ArrayList ans = new ArrayList();
        while (matcher.find ()){
            ans.add (matcher.group ().replace ("'",""));
        }
        return ans;
    }

Input: "more'test'" Expected: [test] Actual: [more,test]
I seem to have a problem with catching just the ' character and I'm tokenizing everything, please help. Adding tester:

void fillSingleQuotedTestInputs(List<String> inputs, List<List<String>> expect) {
        inputs.add("'test'"); expect.add(Arrays.asList("test"));
        inputs.add("more'test'"); expect.add(Arrays.asList("test"));
        inputs.add("'test'more"); expect.add(Arrays.asList("test"));
        inputs.add("\\'no'yes'"); expect.add(Arrays.asList("yes"));
        inputs.add("a 'one' and 'two' and 'three'..."); expect.add(Arrays.asList("one", "two", "three"));
        inputs.add("nothing at all"); expect.add(Arrays.<String>asList());
        inputs.add("''"); expect.add(Arrays.asList(""));
        inputs.add("''test"); expect.add(Arrays.asList(""));
        inputs.add("test''"); expect.add(Arrays.asList(""));
        inputs.add("te''st"); expect.add(Arrays.asList(""));
        inputs.add("'This is not wrong' and 'this isn\\'t either'"); expect.add(Arrays.asList("This is not wrong", "this isn't either"));
        inputs.add("'tw\\'o repl\\'acements' in 't\\'wo stri\\'ngs'."); expect.add(Arrays.asList("tw'o repl'acements", "t'wo stri'ngs"));
        inputs.add("'\\''"); expect.add(Arrays.asList("'"));
        inputs.add("'''"); expect.add(Arrays.asList(""));
        inputs.add("'test1'\n'test2'"); expect.add(Arrays.asList("test1", "test2"));
        inputs.add("''''"); expect.add(Arrays.asList("", "")); // This one is hard. Hint: \G
    }

    @Test
    public void testFindSingleQuotedTextWithEscapes() {
        ArrayList<String> inputs = new ArrayList<String>();
        ArrayList<List<String>> expect = new ArrayList<List<String>>();

        fillSingleQuotedTestInputs(inputs, expect);

        for (int i = 0; i < inputs.size(); ++i) {
            List<String> output = RegexpPractice.findSingleQuotedTextWithEscapes(inputs.get(i));
            assertEquals(String.format("Test %d failed: Search <<%s>>", i, inputs.get(i)), expect.get(i), output);
        }
    }

Upvotes: 1

Views: 526

Answers (2)

Joop Eggen
Joop Eggen

Reputation: 109547

Taking care of backslashes eating the next char or else non-apostrophes:

String s = "1.'2\'3\\'xx'x'";
//            [^^^^^^]  [^]

List<String> findQuotedText(Strings) {
    Pattern quotedPattern = Pattern.compile("'((\\\\.|[^\\\\']+)*)'");
    //                                        |    |    |        |
    //                                apostrophe   |    |       apostrophe
    //                                 backslash+any or non-apostrophes
    Matcher m = quotedPattern.matcher(s);
    List<String> results = new ArrayList<>();
    while (m.find()) {
        results.add(m.group(1));
    }
    return results;
}

Result:

2'3\

Upvotes: 0

Nick Reed
Nick Reed

Reputation: 5059

It looks like (?<!\\)'(.*?)(?<!\\)' will meet all of your needs. It uses a negative lookbehind to assert that, when matching ', there isn't a \ behind it. This passes all of the test cases shown in your code.

Demo

If you want to do it without lookarounds, you can use (?:[^'\n\r]*?'()'|[^\\]'(.*?[^\\])'). Note that this performs more slowly than the first regex shown.

Demo

Upvotes: 2

Related Questions