Nander Speerstra
Nander Speerstra

Reputation: 1526

Multiline RegEx in Java

(My programming question may seem somewhat devious, but I see no other solution.)

A text is written in the editor of Eclipse. By activating a self-made Table view plugin for Eclipse, the text quality is checked automatically by an activated Python script (not editable by me) that receives the editor text. The editor text is stripped from space characters (\n, \t) except the normal space (' '), because otherwise the sentences cannot be QA checked. When the script is done, it returns the incorrect sentences to the table.

It is possible to click on the sentences in the table, and the plugin will search (row-per-row) in the active editor for the clicked sentence. This works for single-line sentences. However, the multiline sentences cannot be found in the active editor, because all the \n and \t are missing in the compiled sentence.

To overcome this problem, I changed the script so it takes the complete editor text as one string. I tried the following:

String newSentence = tableSentence.replaceAll(" ", "\\s+")
Pattern p = Pattern.compile(newSentence)
Matcher contentMatcher = p.matcher(editorContent) // editorContent is a string
if (contentMatcher.find()) {
  // Get index offset of string and length of string
}

By changing all spaces into \s+, I hoped to get the match. However, this does not work because it will look like the following:

So, my question is: how can I adjust the input for the compiler? I am inexperienced when it comes to Java, so I do not see how to change this.. And I unfortunately cannot change the Python script to also return the full sentences...

Upvotes: 2

Views: 124

Answers (2)

nhahtdh
nhahtdh

Reputation: 56819

You need to use "\\\\s+" instead of "\\s+", since \ is the escape character in the regex replacement string syntax. To specify a literal \ in the replacement text, you need to write \\ in the replacement string, and that doubles up to "\\\\" since \ requires escaping in Java string literal.

Note that \ just happens to be used as escape character in regex replacement string syntax in Java. Other languages, such as JavaScript, uses $ to escape $, so \ doesn't need to be escape in JavaScript's regex replacement string.

If you are replacing a match with literal text, you can use Matcher.quoteReplacement to avoid dealing with the escaping in regex replacement string:

String newSentence = tableSentence.replaceAll(" ", Matcher.quoteReplacement("\\s+"));

In this case, since you are searching for string and replace it with another string, you can use String.replace instead, which does normal string replacement:

String newSentence = tableSentence.replace(" ", "\\s+");

Upvotes: 1

Asunez
Asunez

Reputation: 2347

Add a third and fourth backslash to your regex, so it looks like this: \\\\s+.

Java doesn't have raw (or verbatim) strings, so you have to escape a backslash, so in regex engine it will treat it as a double backslash. This should solve the problem of adding a s+ instead of your spaces.

When you type a regex in code it goes like this:

\\\\s+  
 |     # Compile time
 V  
\\s+  
 |     # regex parsing 
 V
 \s+   # actual regex used

Updated my answer according to @nhahtdh comment (fixed number of backslashes)

Upvotes: 2

Related Questions