Srini
Srini

Reputation: 1

Regex to match anchor tags where link text is not same as href value

I am looking Java compataible regular expression to match only anchor tags which don't have href value same as link text

e.g 1 (Should not be matched)

<a href="http://www.google.co.in">http://www.google.co.in</a>

e.g 2 (Should be matched)

<a href="http://www.google.co.in">Google</a>

I have tried the following but it is not working as intended

 <a(.*?)(?i)href\\s*=\\s*"([^"\\s]+)"(.*?)>(?=\\2)(.+?)</a>

Upvotes: 0

Views: 480

Answers (1)

TheLostMind
TheLostMind

Reputation: 36304

Well, if you really want to do this, you have to capture the value of href first and then check if it exists later :

public static void main(String[] args) {
    String s = "<a href=\"http://www.google.co.in\">http://www.google.co.in</a>";
    System.out.println(s.matches("<a href=\"(.*?)\".*\\1.*"));

    String s1 = "<a href=\"http://www.google.co.in\">http://www.google12.co.in</a>";
    System.out.println(s1.matches("<a href=\"(.*?)\".*\\1.*"));

}

O/P :

true
false

Upvotes: 1

Related Questions