Reputation: 365
This method should return all occurrences of single-quoted strings. However, an escaped single quote (\’
) should be treated as a regular single-quote character (just like an escaped double-quote in Java). Example: “This ’isn\’t’ easy’
” the method should return the single string “isn’t
”.
My code:
public static List<String> findSingleQuotedTextWithEscapes(String input) {
Pattern pattern = Pattern.compile ("(?:\\w|'[^']*')+");
Matcher matcher = pattern.matcher (input);
ArrayList ans = new ArrayList();
while (matcher.find ()){
ans.add (matcher.group ().replace ("'",""));
}
return ans;
}
Input: "more'test'"
Expected: [test]
Actual: [more,test]
I seem to have a problem with catching just the ' character and I'm tokenizing everything, please help.
Adding tester:
void fillSingleQuotedTestInputs(List<String> inputs, List<List<String>> expect) {
inputs.add("'test'"); expect.add(Arrays.asList("test"));
inputs.add("more'test'"); expect.add(Arrays.asList("test"));
inputs.add("'test'more"); expect.add(Arrays.asList("test"));
inputs.add("\\'no'yes'"); expect.add(Arrays.asList("yes"));
inputs.add("a 'one' and 'two' and 'three'..."); expect.add(Arrays.asList("one", "two", "three"));
inputs.add("nothing at all"); expect.add(Arrays.<String>asList());
inputs.add("''"); expect.add(Arrays.asList(""));
inputs.add("''test"); expect.add(Arrays.asList(""));
inputs.add("test''"); expect.add(Arrays.asList(""));
inputs.add("te''st"); expect.add(Arrays.asList(""));
inputs.add("'This is not wrong' and 'this isn\\'t either'"); expect.add(Arrays.asList("This is not wrong", "this isn't either"));
inputs.add("'tw\\'o repl\\'acements' in 't\\'wo stri\\'ngs'."); expect.add(Arrays.asList("tw'o repl'acements", "t'wo stri'ngs"));
inputs.add("'\\''"); expect.add(Arrays.asList("'"));
inputs.add("'''"); expect.add(Arrays.asList(""));
inputs.add("'test1'\n'test2'"); expect.add(Arrays.asList("test1", "test2"));
inputs.add("''''"); expect.add(Arrays.asList("", "")); // This one is hard. Hint: \G
}
@Test
public void testFindSingleQuotedTextWithEscapes() {
ArrayList<String> inputs = new ArrayList<String>();
ArrayList<List<String>> expect = new ArrayList<List<String>>();
fillSingleQuotedTestInputs(inputs, expect);
for (int i = 0; i < inputs.size(); ++i) {
List<String> output = RegexpPractice.findSingleQuotedTextWithEscapes(inputs.get(i));
assertEquals(String.format("Test %d failed: Search <<%s>>", i, inputs.get(i)), expect.get(i), output);
}
}
Upvotes: 1
Views: 526
Reputation: 109547
Taking care of backslashes eating the next char or else non-apostrophes:
String s = "1.'2\'3\\'xx'x'";
// [^^^^^^] [^]
List<String> findQuotedText(Strings) {
Pattern quotedPattern = Pattern.compile("'((\\\\.|[^\\\\']+)*)'");
// | | | |
// apostrophe | | apostrophe
// backslash+any or non-apostrophes
Matcher m = quotedPattern.matcher(s);
List<String> results = new ArrayList<>();
while (m.find()) {
results.add(m.group(1));
}
return results;
}
Result:
2'3\
Upvotes: 0
Reputation: 5059
It looks like (?<!\\)'(.*?)(?<!\\)'
will meet all of your needs. It uses a negative lookbehind to assert that, when matching '
, there isn't a \
behind it. This passes all of the test cases shown in your code.
If you want to do it without lookarounds, you can use (?:[^'\n\r]*?'()'|[^\\]'(.*?[^\\])')
. Note that this performs more slowly than the first regex shown.
Upvotes: 2