Reputation: 926
I have the following text
CHAPTER 1
Introduction
CHAPTER OVERVIEW
Which I did create and tested (http://regexr.com/) the following regEx for
(CHAPTER\s{1}\d\n)
However when I use the following code on Java it fails
String text = stripper.getText(document);//The text above
Pattern p = Pattern.compile("(CHAPTER\\s{1}\\d\\n)");
Matcher m = p.matcher(text);
if (m.find()) {
//do action
}
the m.find() returns always false.
Upvotes: 1
Views: 57
Reputation: 785631
Your document may have DOS line feed \r
as well. You can use either of these patterns:
Pattern p = Pattern.compile("CHAPTER\\s+\\d+\\R");
\R
(requires Java 8) will match any combination of \r
and \n
after your digits or just use:
Pattern p = Pattern.compile("CHAPTER\\s+\\d+\\s");
since \s
also matches any whitespace including newline characters.
Another alternative is to use MULTILINE
flag with anchor $
:
Pattern p = Pattern.compile("(?m)CHAPTER\\s+\\d+$");
Upvotes: 3
Reputation: 1583
Your problem is in your source text. I think you forget about new lines. Because this:
String text = "CHAPTER 1\n" +
"Introduction\n" +
"CHAPTER OVERVIEW";
Pattern p = Pattern.compile("(CHAPTER\\s{1}\\d\\n)");
Matcher m = p.matcher(text);
System.out.println(m.find());
will write true. String body is copied from here and Intellij add there new lines. Try to debug what you really get in stripper.getText(document)
.
You can use Pattern as second param for compile. (Pattern.MULTILINE) More info
.
Upvotes: 0