LeO
LeO

Reputation: 5258

Regexpression - mutliline in Java

I have an arbitray string, e.g.

String multiline=`
This is my "test" case
with lines
\section{new section}
Another incorrect test"
\section{next section}
With some more "text"
\subsection{next section}
With some more "text1"
`

I use LaTeX and I want to replace the quotes with those which are used in books - similar to ,, and ´´ For this I need to replace the beginning quotes with a \glqq and the ending with a \qrqq - for each group which starts with \.?section.

If I try the following

String pattern1 = "(^\\\\.?section\\{.+\\})[\\s\\S]*(\\\"(.+)\\\")";
Pattern p = Pattern.compile(pattern1, Pattern.MULTILINE);
Matcher m = p.matcher(testString);
System.out.println(p.matcher(testString).find()); //true

while (m.find()) {
  for (int i = 0; i < 4; i++) {
    System.out.println("Index: " + i);
    System.out.println(m.group(i).replaceAll("\"([\\w]+)\"", "\u00AB$1\u00BB"));
  }
}

I get as a result on the console

true
Index: 0
\section{new section}
Another incorrect test"
\section{next section}
With some more «text1»
Index: 1
\section{new section}
Index: 2
«text1»
Index: 3
text1

My some problems with the current approach:

  1. The first valid match ("text") isn't found. I guess it has to do with the mulitline and incorrect grouping of \section{. The grouping for the quotes should be restricted to a group which starts with \section and ends with \?.section - how to make this correct?
  2. Even when the text is found properly - how to get a complete string with the replacements?

Upvotes: 1

Views: 79

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626926

You may match all texts between section and the next section or end of string, and replace all "..." strings inside it with «....

Here is the Java snippet (see demo):

String s = "This is my \"test\" case\nwith lines\n\\section{new section}\nAnother incorrect test\"\n\\section{next section}\nWith some more \"text\"\n\\subsection{next section}\nWith some more \"text1\"";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("(?s)section.*?(?=section|$)").matcher(s);
while (m.find()) {
    String out = m.group(0).replaceAll("\"([^\"]*)\"", "«$1»");
    m.appendReplacement(result, Matcher.quoteReplacement(out));
}
m.appendTail(result);
System.out.println(result.toString());

Output:

This is my "test" case
with lines
\section{new section}
Another incorrect test"
\section{next section}
With some more «text»
\subsection{next section}
With some more «text1»

The pattern means:

  • (?s) - Pattern.DOTALL embedded flag option
  • section - a section substring
  • .*? - any 0+ chars, as few as possible
  • (?=section|$) - a positive lookahead that requires a section substring or end of string to appear immediately to the right of the current location.

Upvotes: 1

Related Questions