WiredCoder
WiredCoder

Reputation: 926

RegEx Exepression not matching

I have the following text

CHAPTER 1
Introduction
CHAPTER OVERVIEW 

Which I did create and tested (http://regexr.com/) the following regEx for

(CHAPTER\s{1}\d\n)

However when I use the following code on Java it fails

String text = stripper.getText(document);//The text above
Pattern p = Pattern.compile("(CHAPTER\\s{1}\\d\\n)");
Matcher m = p.matcher(text);
    if (m.find()) {
        //do action
    }

the m.find() returns always false.

Upvotes: 1

Views: 57

Answers (2)

anubhava
anubhava

Reputation: 785631

Your document may have DOS line feed \r as well. You can use either of these patterns:

Pattern p = Pattern.compile("CHAPTER\\s+\\d+\\R");

\R (requires Java 8) will match any combination of \r and \n after your digits or just use:

Pattern p = Pattern.compile("CHAPTER\\s+\\d+\\s");

since \s also matches any whitespace including newline characters.

Another alternative is to use MULTILINE flag with anchor $:

Pattern p = Pattern.compile("(?m)CHAPTER\\s+\\d+$");

Upvotes: 3

Hrabosch
Hrabosch

Reputation: 1583

Your problem is in your source text. I think you forget about new lines. Because this:

String text = "CHAPTER 1\n" +
                "Introduction\n" +
                "CHAPTER OVERVIEW";
        Pattern p = Pattern.compile("(CHAPTER\\s{1}\\d\\n)");
        Matcher m = p.matcher(text);
        System.out.println(m.find());

will write true. String body is copied from here and Intellij add there new lines. Try to debug what you really get in stripper.getText(document). You can use Pattern as second param for compile. (Pattern.MULTILINE) More info

here

.

Upvotes: 0

Related Questions