user176957
user176957

Reputation:

Regex to match the first ending HTMl tag

I am trying to write a regex which match the first ending form tag.

  <form.*name="loginForm".*>[^~]*</form> 

The above regex matches till the second from ends i.e till line 8. but I want a regex that matches the immediate ending from tag in the below example it should match line 5.

<html>
<body>
<form method = "post" name="loginForm" >
<input type="text" name="userName"/>
</form>
<form method = "post" name="signupForm" >
<input type="text" name="userName"/>
</form>
</body>
</html>

Upvotes: 3

Views: 8526

Answers (3)

Guffa
Guffa

Reputation: 700312

Just make the pattern non-greedy so that it matches the smallest possible amount of characters instead of the largest possible:

<form[^>]*name="loginForm"[^>]*>[^~]*?</form>

Edit:
Changed .* to [^>]* in the form tag, so that it doesn't match outside the tag.

Upvotes: 12

meder omuraliev
meder omuraliev

Reputation: 186562

You should NOT use regular expressions, but parse it with DOM:

Javascript:

var forms = document.getElementsByTagName('form');
forms[0] // is the first form element.

PHP:

$dom = new DOMDocument();
$dom->loadHTML( $html );
$forms = $dom->getElementsByTagName('form');
$first = $forms->item(0); // reference to first form

You can use minidom and ElementTree for Python.

Upvotes: 2

Gumbo
Gumbo

Reputation: 655239

Use a real parser like DOMDocument, SimpleXML or SimpleHTMLDOM. Regular expressions are not suitable for parsing non-regular languages like HTML.

Upvotes: 2

Related Questions