Reputation: 85
I'm trying to get a text within a certain tag. So if I have:
<a href="http://something.com">Found<a/>
I want to be able to retrieve the Found
text.
I'm trying to do it using regex. I am able to do it if the <a href="http://something.com>
stays the same but it doesn't.
So far I have this:
Pattern titleFinder = Pattern.compile( ".*[a-zA-Z0-9 ]* ([a-zA-Z0-9 ]*)</a>.*" );
I think the last two parts - the ([a-zA-Z0-9 ]*)</a>.*
- are ok but I don't know what to do for the first part.
Upvotes: 6
Views: 9112
Reputation: 336468
As they said, don't use regex to parse HTML. If you are aware of the shortcomings, you might get away with it, though. Try
Pattern titleFinder = Pattern.compile("<a[^>]*>(.*?)</a>", Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
Matcher regexMatcher = titleFinder.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group(1)
}
will iterate over all matches in a string.
It won't handle nested <a>
tags and ignores all the attributes inside the tag.
Upvotes: 6
Reputation:
str.replaceAll("</?a>", "");
Here is online ideone demo
Here is similar topic : How to remove the tags only from a text ?
Upvotes: 0