Reputation: 9827
I have a regex that is using a "'''.*?'''|'.*?'"
pattern to look for text between tripple quotes (''') and single quotes ('). When carriage returns are added to the input String the regex pattern fails to read to the end of the triple quote. Any idea how to change the regex to read to the end of the triple tick and not break on the \n? The quoteMatcher.end() returns the value of 2 so the fail case below returns ''''''
Works:
'''<html><head></head></html>'''
Fails:
User Entered Value:
'''<html>
<head></head>
</html>'''
Java Representation:
'''<html>\n<head></head>\n</html>'''
Parsing Logic:
public static final Pattern QUOTE_PATTERN = Pattern.compile("'''.*?'''|'.*?'");
Matcher quoteMatcher = QUOTE_PATTERN.matcher(value);
int normalPos = 0, length = value.length();
while (normalPos < length && quoteMatcher.find()) {
int quotePos = quoteMatcher.start(), quoteEnd = quoteMatcher.end();
if (normalPos < quotePos) {
copyBuilder.append(stripHTML(value.substring(normalPos, quotePos)));
}
//quoteEnd fails to read to the end due to \n
copyBuilder.append(value.substring(quotePos, quoteEnd));
normalPos = quoteEnd;
}
if (normalPos < length) copyBuilder.append(stripHTML(value.substring(normalPos)));
Upvotes: 0
Views: 125
Reputation: 3176
Simply use the Pattern.DOTALL
modifier so the .
also matches line breaks.
public static final Pattern QUOTE_PATTERN = Pattern.compile("'''.*?'''|'.*?'", Pattern.DOTALL);
Upvotes: 3