Milkakuh
Milkakuh

Reputation: 73

Java Regex Multiline issue

I have a String read from a file via apache commons FileUtils.readFileToString, which has the following format:

<!--LOGHEADER[START]/-->
<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->
<!--LOGHEADER[END]/-->
#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)

I am trying to filter out everything between the LOGHEADER[START] and LOGHEADER[END] line. Therefore I created a java regex:

String fileContent = FileUtils.readFileToString(file);
String logheader = "LOGHEADER\\[START\\].*LOGHEADER\\[END\\]";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());

(Dotall since it is a Multiline pattern and i want to cover linebreaks as well) However this pattern does not match the String. If I try to remove the LOGHEADER\[END\] part of the regex I get a match, that contains the whole String. I don't get why it is not matching for the original RegEx.

Any help is appreciated - thanks a lot!

Upvotes: 2

Views: 164

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626699

The important thing to remember about this Java matches() method is that your regular expression must match the entire line.

So, you have to use find() this way to capture all in-between <!--LOGHEADER[START]/--> and n<!--LOGHEADER[END]/--:

String logheader = "(?<=LOGHEADER\\[START\\]/-->).*(?=<!--LOGHEADER\\[END\\])";
        Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
        Matcher m = p.matcher(fileContent);
        while(m.find()) {
         System.out.println(m.group());
       }

Or, to follow the logics you suggest (just using matches), we need to add ^.* and .*$:

String logheader = "^.*LOGHEADER\\[START\\].*LOGHEADER\\[END\\].*$";
Pattern p = Pattern.compile(logheader, Pattern.DOTALL);
Matcher m = p.matcher(fileContent);
System.out.println(m.matches());

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174696

You actually need to use Pattern and Matcher classes along with find method. The below regex will fetch all the lines which exists between LOGHEADER[START] and LOGHEADER[END].

String s = "<!--LOGHEADER[START]/-->\n" + 
        "<!--HELP[Manual modification of the header may cause parsing problem!]/-->\n" + 
        "<!--LOGGINGVERSION[2.0.7.1006]/-->\n" + 
        "<!--NAME[./log/defaultTrace_00.trc]/-->\n" + 
        "<!--PATTERN[defaultTrace_00.trc]/-->\n" + 
        "<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->\n" + 
        "<!--ENCODING[UTF8]/-->\n" + 
        "<!--FILESET[0, 20, 10485760]/-->\n" + 
        "<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->\n" + 
        "<!--NEXTFILE[defaultTrace_00.1.trc]/-->\n" + 
        "<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->\n" + 
        "<!--LOGHEADER[END]/-->\n" + 
        "#2.0#2015 03 04 11:04:19:687#+0100#Debug#...(few lines to follow)";
Matcher m = Pattern.compile("(?s)\\bLOGHEADER\\[START\\][^\\n]*\\n(.*?)\\n[^\\n]*\\bLOGHEADER\\[END\\]").matcher(s);
while(m.find())
{

System.out.println(m.group(1));

}

Output:

<!--HELP[Manual modification of the header may cause parsing problem!]/-->
<!--LOGGINGVERSION[2.0.7.1006]/-->
<!--NAME[./log/defaultTrace_00.trc]/-->
<!--PATTERN[defaultTrace_00.trc]/-->
<!--FORMATTER[com.sap.tc.logging.ListFormatter]/-->
<!--ENCODING[UTF8]/-->
<!--FILESET[0, 20, 10485760]/-->
<!--PREVIOUSFILE[defaultTrace_00.19.trc]/-->
<!--NEXTFILE[defaultTrace_00.1.trc]/-->
<!--ENGINEVERSION[7.31.3301.368426.20141205114648]/-->

If you do want to match also the LOGHEADER lines, then a capturing group would be an unnecessary one.

Matcher m = Pattern.compile("(?s)[^\\n]*\\bLOGHEADER\\[START\\].*?\\bLOGHEADER\\[END\\][^\\n]*").matcher(s);
while(m.find())
{

System.out.println(m.group());

}

Upvotes: 0

Related Questions