Reputation: 4006
The problem is following: I am having a Javadoc-generated HTML file containing Java class names and some additional information, like this:
{@link ml.foo.bar.BazAccEd} (Text) Some text
{@link ml.foo.bar.BazAccGrp} (Text) Some text BazAccGrpList
{@link ml.foo.bar.BazAccEdOrGroup} (Text) Some text {@link.ml.foo.bar.BazAccEdList}
I need to extract from it (using Ant regex capabilities) only the short names of Java classes and only where they are parts of links, inserting commas in place of the original ordinary text, so that the sample above would produce
BazAccEd
BazAccGrp
BazAccEdOrGroup, BazAccEdList
It probably isn't anything too complicated yet I fail to come across the correct regular expression that would parse only the links and extract the correct data from them. Thanks in advance.
Upvotes: 1
Views: 269
Reputation: 4852
This should work, given the inputs you provided. It works by capturing the text between a period and a closing curly brace:
\.([A-Za-z\d_]+)(?=})(?:.+\.([A-Za-z\d_]+)(?=}))*
This will return two captured groups \1
and \2
. In order to get the comma replace working correctly, you'll have to check to see if there's anything in \2
. If so, insert a comma between \1
and \2
.
Explanation:
\.([A-Za-z\d_]+)(?=}) # look for a period, characters, and lookahead for closing curly brace. Capture the characters
(?: # open a non-capturing group
.+ # gobble up characters until ...
\.([A-Za-z\d_]+)(?=}) # ... you find the same thing as in the first line above
)* # make the non-capturing group optional
Upvotes: 3
Reputation: 756
you can use that regular expression.
{@link[ .][a-zA-Z].[a-zA-Z].[a-zA-Z].([A-Z-a-z0-9])}
Upvotes: 1