Reputation: 1
As a beginner with regular expressions, I am trying to use them to extract the text from C-Style and JavaDoc Comments in a Java source file. It has been a frustrating experience, all my attempts have achieved only partial success. I have been using the Pattern "\\n\\p{Blank}++\\x{2A}"
to identify and replace the text from the end of line \n
to the *
on the next line. But that does not handle the *\n
that marks a new paragraph.
It occurs to me that the selecting and extracting the text alone may be a better solution. Can anybody help?
The result I want is the text alone, with the preceding /*
or /**
, the trailing */
and the spaces and *
at the beginning of each line removed. If the comment is:
/* Quisque congue nibh diam, quis gravida ligula pharetra ut.
* Duis maximus risus turpis, convallis hendrerit sapien
* malesuada non. Integer ornare augue lorem, eu placerat
* velit pharetra quis. Maecenas varius elit ac nulla
* porttitor, id cursus mauris varius. Suspendisse potenti.
* In tempus faucibus nulla posuere aliquam. Sed efficitur
* lorem est, ac ullamcorper nibh blandit eget.
*
* Mauris et interdum enim. Duis ac malesuada ante. Sed ut
* ipsum ut odio aliquet accumsan nec vitae risus. Quisque
* lacinia elit risus, faucibus dapibus neque euismod id.
* Sed eu leo cursus, porttitor justo eget, tincidunt augue.
* Donec sit amet ex non arcu auctor semper id non lorem.
* Nullam ac augue in ipsum iaculis faucibus cursus eget nisi.
* Sed risus tortor, cursus vel blandit in, tempus ut tortor.
* Etiam lobortis tristique sem vitae finibus. Duis sit amet
* turpis lorem. Morbi dictum libero et porta consectetur.
*/
The result I want is:
"Quisque congue nibh diam, quis gravida ligula pharetra ut. Duis maximus risus turpis, convallis hendrerit sapien malesuada non. Integer ornare augue lorem, eu placerat velit pharetra quis. Maecenas varius elit ac nulla porttitor, id cursus mauris varius. Suspendisse potenti. In tempus faucibus nulla posuere aliquam. Sed efficitur lorem est, ac ullamcorper nibh blandit eget.
Mauris et interdum enim. Duis ac malesuada ante. Sed ut ipsum ut odio aliquet accumsan nec vitae risus. Quisque lacinia elit risus, faucibus dapibus neque euismod id. Sed eu leo cursus, porttitor justo eget, tincidunt augue. Donec sit amet ex non arcu auctor semper id non lorem. Nullam ac augue in ipsum iaculis faucibus cursus eget nisi. Sed risus tortor, cursus vel blandit in, tempus ut tortor. Etiam lobortis tristique sem vitae finibus. Duis sit amet turpis lorem. Morbi dictum libero et porta consectetur."
Though without the formating (line breaks) that has been imposed by the this site's editor.
Upvotes: 0
Views: 92
Reputation: 3553
Pattern p = Pattern.compile("^(/?[\\*]{0,2})([^/\\*\\n]{0,})", Pattern.MULTILINE);
Matcher m = p.matcher(CommentString);
boolean found = false;
while(m.find())
{
Matcher m2 = Pattern.compile("\\s+").matcher(m.group(2));
if(m2.matches() || m.group(2).equals(""))
{
System.out.println("");
}
else
{
System.out.print(m.group(2).trim() + " ");
}
found = true;
}
if(!found)
{
System.out.println("NOT");
}
Assuming there aren't any asterisks [*] or slashes [/] in the comment.
^(/?[\\*]{0,2})
- Checks if the line being read starts with a /*
, /**
or *
([^/\\*\\n]{0,})
- This is the group which captures the body of the comment (Anything which is not *
, /
or \n
)
Upvotes: 0