Crocode
Crocode

Reputation: 3134

Java Regular Expression Escape Sequence

I was trying to match the example in , <p><a href="example/index.html">LinkToPage</a></p>

With rubular.com I could get something like <a href=\"(.*)?\/index.html\">.*<\/a>.

I'll be using this in Pattern.compile in Java. I know that \ has to be escaped as well, and I've come up with <a href=\\\"(.*)?\\\/index.html\\\">.*<\\\/a> and a few more variations but I'm getting it wrong. I tested on regexplanet. Can anyone help me with this?

Upvotes: 0

Views: 235

Answers (3)

John Humphreys
John Humphreys

Reputation: 39354

You can tell Java what to match and call Pattern.quote(str) to make it escape the correct things for you.

Upvotes: 0

Aurand
Aurand

Reputation: 5547

Pattern.compile("<a href=\"(.*)?/index.html\">.*</a>");

That should fix your regex. You do not need to escape the forward slashes.

However I am obligated to present you with the standard caution against parsing HTML with regex:

RegEx match open tags except XHTML self-contained tags

Upvotes: 1

Laurence Gonsalves
Laurence Gonsalves

Reputation: 143354

Use "<a href=\"(.*)/index.html\">.*</a>" in your Java code.

You only need to escape " because it's a Java string literal.

You don't need to escape /, because you aren't delimiting your regex with slashes (as you would be in Ruby).

Also, (.*)? makes no sense. Just use (.*). * can already match "nothing", so there's no point in having the ?.

Upvotes: 2

Related Questions