Reputation: 186
Okay, so I've been working on this problem for a couple of weeks now where I have a program that reads a file that contains some code from a mini language, read and then print each token with a description of what the token is. Part of this mini language is its ability to support single-line and multi-line comments.
The regular expression for comments are \{[^\}]*\}
meaning:
\{
[^\}]*
\}
Side Note: Comments cannot be nested, meaning that if I have a comment such as {This is a {nested} comment}
would not be considered a valid comment because it can only have one closing curly brace. That being said, however, a comment such as {This is another {comment}
would be valid since there is only one closing curly brace
While testing this program out, I ran into an issue where my program would read in a file and come across a multi-line comment, but instead of the program being able to recognize the comment as multi-lined, it just prints out what's inside of the comment, rather than the whole comment itself. I've spent a good week or week and a half on trying to get this to work. I've tried various combinations of regular expressions and where I place my if
statements, but to come to no solution. I've tried everything I can to fix it but since I'm not very experienced with regular expressions I must be missing something pretty obvious.
Here I have a snippet of my code
Side Note: I have my program take in the name of the file through user input in another class.
import java.io.*;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Analyzer {
public void lex(String filename) {
try {
Scanner scanFile = new Scanner(file);
while(scanFile.hasNextLine()) {
String str = scanFile.nextLine();
String keyword = "(\\bWHILE\\b|\\bENDWHILE\\b|\\bIF\\b|\\bENDIF\\b|\\bPRINT\\b)";
String comment = "(\\{[^\\}]*\\})";
String literal = "(\\b[0-9]+\\b)";
String identifier = "(\\b[a-z]+\\b)";
String symbol = "((\\()|(\\))|(;))";
String operator = "((\\+)|(\\-)|(\\*)|(/)|(\\=)|(\\<)|(\\:\\=))";
String keywordERROR = "(PRINT\\w+)";
String commentERROR = "(\\{.*\\}.*\\})";
String literalERROR = "([0-9]+[a-zA-Z_]+)";
String identERROR = "([a-z]+[A-Z_0-9]+)";
String alphabetERROR = "(~|`|\\!|@|#|\\$|%|\\^|\\&|_|\\||\\:|'|\"|\\?|\\>|\\.|\\,|\\\\)";
String regex = keyword + "|" + keywordERROR + "|" + comment + "|" + commentERROR + "|" + literal + "|" + literalERROR
+ "|" + identifier + "|" + identERROR + "|" + symbol + "|" + operator + "|" + alphabetERROR;
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while(matcher.find()) {
if(matcher.group(1) != null)
System.out.println(matcher.group(1) + "\tKeyword");
else if(matcher.group(2) != null)
System.out.println(matcher.group(2) + "\tError");
if(matcher.group(3) != null)
System.out.println(matcher.group(3) + "\tComment");
else if(matcher.group(4) != null)
System.out.println(matcher.group(4) + "\tError");
if(matcher.group(5) != null)
System.out.println(matcher.group(5) + "\tLiteral");
else if(matcher.group(6) != null)
System.out.println(matcher.group(6) + "\tError");
if(matcher.group(7) != null)
System.out.println(matcher.group(7) + "\tIdentifier");
else if(matcher.group(8) != null)
System.out.println(matcher.group(8) + "\tError");
if(matcher.group(9) != null) {
if(matcher.group(10) != null)
System.out.println(matcher.group(10) + "\tOpen Parenthesis");
if(matcher.group(11) != null)
System.out.println(matcher.group(11) + "\tClose Parenthesis");
if(matcher.group(12) != null)
System.out.println(matcher.group(12) + "\tSemi-colon");
}
if(matcher.group(13) != null) {
if(matcher.group(14) != null)
System.out.println(matcher.group(14) + "\tAddition Operator");
if(matcher.group(15) != null)
System.out.println(matcher.group(15) + "\tSubtraction Operator");
if(matcher.group(16) != null)
System.out.println(matcher.group(16) + "\tMultiplication Operator");
if(matcher.group(17) != null)
System.out.println(matcher.group(17) + "\tDivision Operator");
if(matcher.group(18) != null)
System.out.println(matcher.group(18) + "\tEquality Comparison Operator");
if(matcher.group(19) != null)
System.out.println(matcher.group(19) + "\tLess Than Operator");
if(matcher.group(20) != null)
System.out.println(matcher.group(20) + "\tAssignment Operator");
}
if(matcher.group(21) != null)
System.out.println(matcher.group(21) + "\tError");
}
}
scanFile.close();
} catch(Exception e) {
e.printStackTrace();
}
}
}
Like I said before, I've tried many different ways on trying to solve for this issue. Some of the things I've tried were adding the return sequences like this: \{[^\}]*[\r\n]*\}
, \{[\r\n]*[^\}]*\}
, \{[\r\n]*[^\}]*[\r\n]*\}
, \{[^\}]*\s*\}
, \{\s*[^\}]*\s*\}
, (?s)\{[^\}]*\}
and (?m)\{[^\}]*\}
, trying the DOTALL
and MULTILINE
flags for my Pattern object, and just looking for any tutorial I could find to use but I haven't had any luck.
The file that I'm reading from looks like this:
{This is
a multi-line
comment.}
WHILE(x<10)
PRINT x;
x:=x+2;
ENDWHILE
The output should look like this:
{This is a multi-line comment} Comment
WHILE Keyword
( Open Parenthesis
x Identifier
< Less Than Operator
10 Literal
) Close Parenthesis
PRINT Keyword
x Identifier
; Semi-colon
x Identifier
:= Assignment Operator
x Identifier
+ Addition Operator
2 Literal
; Semi-colon
ENDWHILE Keyword
But instead the output looks like this:
is Identifier
a Identifier
multi Identifier
- Subtraction Operator
line Identifier
comment Identifier
. Error
WHILE Keyword
( Open Parenthesis
x Identifier
< Less Than Operator
10 Literal
) Close Parenthesis
PRINT Keyword
x Identifier
; Semi-colon
x Identifier
:= Assignment Operator
x Identifier
+ Addition Operator
2 Literal
; Semi-colon
ENDWHILE Keyword
I'm not sure what I'm doing wrong. Any and all help is greatly appreciated!
Upvotes: 0
Views: 55
Reputation: 9202
You can just continue reading the file with another while loop IF your line starts with a open curly brace but doesn't end with a close curly brace, something like this:
while(scanFile.hasNextLine()) {
String str = scanFile.nextLine().trim(); // trim off indents etc.
// If the line is blank just read in the next line.
if (str.equals("")) { continue; }
// If this is a multi-line comment then
if (str.startsWith("{") && !str.endsWith("}")) {
while(scanFile.hasNextLine()) {
String commentStr = scanFile.nextLine().trim();
str+= " " + commentStr;
if (commentStr.endsWith("}")) { break; }
}
}
// Do the rest of your processing....
// ..................................
// ..................................
}
On another note....I wouldn't use RegEx to parse this file content but perhaps you need to for some reason. Good RegEx exercise in any case. :)
Upvotes: 1
Reputation: 6439
The code doesn't work because Java's Pattern
(used for regex) has MULTILINE
mode disabled by default.
Try enabling it using (?m)
at the start of the regex string. Or otherwise setting the Pattern
's configuration to use MULTILINE.
AFAICS there is nothing (else) wrong with the \{[^\}]*\}
regex, although you could probably use \{.*?\}
instead, which is slightly more readable.
Upvotes: 0