Filippo Lauria
Filippo Lauria

Reputation: 2064

Regex troubles (probally) on return carriage and new line

I have the following text:

&rule_c(2-7, <<'EOF');
cout << "Hello World.\n";
return x;
EOF

I want to match this text into a regular expression.

The one I was thinking about was:

^&rule_c\((\d+)\-(\d+),\s?\<\<\s?\'EOF\'\);\r?\n|\r\n?(.*\r?\n|\r\n?)+EOF\r?\n|\r\n?$

I tried it with Java:

private static final String newLine = System.getProperty("line.separator").toString();
 ...
String textual = "&rule_c(2-7, <<'EOF');" + newLine
 + "cout << "Hello World.\n";" + newLine
 + "return x;" + newLine
 + "EOF" + newLine;

String lineSep = "\\r?\\n|\\r\\n?";
String regex = "^&rule_c\\((\\d+)\\-(\\d+),\\s?\\<\\<\\s?\\'EOF\\'\\);"
  + lineSep + "(.*" + lineSep + ")+EOF" + lineSep + "$";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(textual);
if (m.matches()) {
    rangeLowerBound = Integer.parseInt(m.group(1));
    rangeUpperBound = Integer.parseInt(m.group(2));


    String[] tmp = m.group(3).split(lineSep);
    System.out.println(tmp.toString());
    for (String l : tmp)
        System.out.println(l);

    lineSet = new ArrayList<String>();
    Collections.addAll(lineSet, tmp);

} else
    System.out.println("regex doesn't match!");
 ...

The only result i'm obtaining is regex doesn't match!.

Where I'm failing?

Upvotes: 0

Views: 159

Answers (4)

Filippo Lauria
Filippo Lauria

Reputation: 2064

I used String lineSep = (?:\\r?\\n|\\r\\n?)+; (and not String lineSep = [\\r?\\n|\\r\\n?]+; that actually matches | and ? characters, too) to solve, combining answers and suggestions from Pshemo (mainly) and Fedor Skrynnikov.

Also 'used suggestion from Bohemian to remove unnecessary character escaping.

Here there is the example from gskinner.com's RegEx Tester.

Upvotes: 0

Pshemo
Pshemo

Reputation: 124225

| in \\r?\\n|\\r\\n? splits your entire regex to separate parts regex1|regex2. To solve this problem you can put this in parenthesis. Also since you dont want to include it in your group count you can use (?:...) to crate non-capturing group.

So change

String lineSep = "\\r?\\n|\\r\\n?";

to

String lineSep = "(?:\\r?\\n|\\r\\n?)";

BTW to print content of array you should use Arrays.toString(yourArray) not yourArray.toString() so maybe change

System.out.println(tmp.toString())

to

System.out.println(Arrays.toString(tmp))

Upvotes: 1

Bohemian
Bohemian

Reputation: 425003

Use the "multiline" regex switch (?m), which lets you use \s to match newlines too:

String regex = "(?m)^&rule_c\\((\\d+)-(\\d+),\\s?<<\\s?'EOF'\\);\\s(.*\\s)+EOF\\s$";

Also removed unnecessary escaping of <, - and '.

Upvotes: 0

george_h
george_h

Reputation: 1592

I think you're issue was in the line separator. This worked for me from your code sample. Also the strings were not properly escaped, I had to escape the double quotes from your example.

final String newLine = System.getProperty("line.separator").toString();

StringBuilder sb = new StringBuilder();
sb.append("&rule_c(2-7, <<'EOF');");
sb.append(newLine);
sb.append("cout << \"Hello World.\n\";");
sb.append(newLine);
sb.append("return x;");
sb.append(newLine);
sb.append("EOF");
sb.append(newLine);
String textual = sb.toString();

String lineSep = "(\r?\n|\r\n?)";
String regex = "\\&rule_c\\(2\\-7, <<'EOF'\\);"+lineSep+"cout << \"Hello World.\\n\";"+lineSep+"return x;"+lineSep+"EOF"+lineSep;

Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(textual);
if (m.matches()) {
    System.out.println("regex matches!");

}
else {
    System.out.println("regex doesn't match!");
}

Upvotes: 0

Related Questions