Reputation: 929
The following regex successfully works when testing here, but when I try to implement it into my Java code, it won't return a match. It uses a negative lookahead to ensure no newlines occur between MAIN LEVEL
and Bedrooms
. Why won't it work in Java?
regex
^\s*\bMAIN LEVEL\b\n(?:(?!\n\n)[\s\S])*\bBedrooms:\s*(.*)
Java
pattern = Pattern.compile("^\\s*\\bMAIN LEVEL\\b\\n(?:(?!\\n\\n)[\\s\\S])*\\bBedrooms:\\s*(.*)");
match = pattern.matcher(content);
if(match.find())
{
//Doesn't reach here
String bed = match.group(1);
bed = bed.trim();
}
content
is just a string read from a text file, which contains the exact text shown in the demo linked above.
File file = new File("C:\\Users\\ME\\Desktop\\content.txt");
content = new Scanner(file).useDelimiter("\\Z").next();
UPDATE:
I changed my code to include a multiline modifier (?m)
, but it prints out "null".
pattern = Pattern.compile("(?m)^\\s*\\bMAIN LEVEL\\b\\n(?:(?!\\n\\n)[\\s\\S])*\\bBedrooms:\\s*(.*)");
match = pattern.matcher(content);
if(match.find())
{ // Still not reaching here
mainBeds=match.group(1);
mainBeds= mainBeds.trim();
}
System.out.println(mainBeds); // Prints null
Upvotes: 1
Views: 1238
Reputation: 6441
As explained in Alan Moore's answer, it's a mismatch between the format of the Line-Separators
used in your file (\r\n
), and what your pattern is specifying (\n
):
Original code:
Pattern.compile("^\\s*\\bMAIN LEVEL\\b
\\n
(?:(?!
\\n\\n
)[\\s\\S])*\\bBedrooms:\\s*(.*)");
Note: I explain what the \r
and \n
represent, and the context and difference between \r\n
and \n
, in the second item of the "side notes" section.
Most/all Java versions:
You can use \r?\n
to match both formats, and this is sufficient in most cases.
Most/all Java versions:
You can use \u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029]
to match "Any Unicode linebreak sequence".
Java 8 and later:
You can use the Linebreak Matcher (\R
). It is equivalent to the second method (above), and whenever possible (Java 8 or later), this is the recommended method.
Resulting code (3rd method):
Pattern.compile("^\\s*\\bMAIN LEVEL\\b
\\R
(?:(?!
\\R\\R
)[\\s\\S])*\\bBedrooms:\\s*(.*)");
You can replace \\R\\R
with \\R{2}
, which is more readable.
Different formats of line-breaks exist and are used in different systems because early OSs inherited the "line-break logic" from mechanical typing machines, like typewriters.
The \r
in code represents a Carriage-Return, aka CR
. The idea behind this is to return the typing cursor to the start of the line.
The \n
in code represents a Line-Feed, aka LF
. The idea behind this is to move the typing cursor to the next line.
The most common line-break formats are CR-LF
(\r\n
), used primarily by Windows; and LF
(\n
), used by most UNIX-like systems. This is the reason why "\r?\n
will be sufficient in most cases", and you can reliably use it for systems intended for household-grade users.
However, some (rare) OSs, usually in industrial-grade stuff such as servers, may use CR
, LF-CR
, or something else entirely, which is why the second method has so many characters in it, so if you need the code to be compatible with every system, you will need the second, or preferably, the third method.
Here is a useful method for testing where your patterns are failing:
String content = "..."; //Replace "..." with your content.
String patternString = "..."; //Replace "..." with your pattern.
String lastPatternSuccess = "None. You suck at Regex!";
for (int i = 0; i <= patternString.length(); i++) {
try {
String patternSubstring = patternString.substring(0, i);
Pattern pattern = Pattern.compile(patternSubstring);
Matcher matcher = pattern.matcher(content);
if (matcher.find()) {
lastPatternSuccess = i + " - Pattern: " + patternSubstring + " - Match: \n" + matcher.group();
}
} catch (Exception ex) {
//Ignore and jump to next
}
}
System.out.println(lastPatternSuccess);
Upvotes: 5
Reputation: 75222
It's the line separators. You're looking for \n
, but your file actually uses \r\n
. If you're running Java 8, you can change every \\n
in your code to \\R
(the universal line separator). For Java 7 or earlier, use \\r?\\n
.
Upvotes: 2