user3111525
user3111525

Reputation: 5203

Closing open XML tags with regex

Basically I want to do the same as here which is done in Python. I'd like to replace all self-closed elements to the long syntax.

Example

    <iframe src="http://example.com/thing"/>

becomes

    <iframe src="http://example.com/thing"></iframe>

Full example:

 <html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  <link rel="stylesheet" type="text/css" href="/sample.css">
  <title></title>
  <script type="text/javascript" src="/swfobject.js">
                //void
          </script>
  <script type="text/javascript" language="JavaScript" src="/generate.js">
//void
  </script>
  <script type="text/javascript" language="JavaScript" src="/prototype.js">
//void
  </script>
</head>
<body id="mediaPlayer" style="margin:0;padding:0;">
<script type="text/javascript">
                                swfobject.registerObject('id_G12564763');       


                function getFlashObject() {
                        var object;
                        if (navigator.appName == 'Microsoft Internet Explorer' || navigator.userAgent.indexOf("Chrome")!=-1)
                        {
                                object = document.getElementById('id_G12564763');
                        } 
                        else 
                        {
                                object = document['flash_id_G12564763'];
                        }
                        return object;
                }

        </script>
</body>
</html>

Upvotes: 5

Views: 1786

Answers (3)

Scott Evernden
Scott Evernden

Reputation: 39926

String resultHtml = inputHtml.replaceAll("(?six)<(\\w+)([^<]*?)/>", "<$1$2></$1>");

and this will properly handle tags that are not terminated like <hr> and <img>

Upvotes: 1

user3111525
user3111525

Reputation: 5203

Ok guys. I found a workaround. I hooked the output method to xml where this html comes from and the XSLT engine takes care of closing those open tags for me. Thanks for answers, but if you happen to have a solution for the problem pls, leave your answer and I will mark it as an answer. This could be useful for others.

Upvotes: 1

Topera
Topera

Reputation: 12389

This can be used to replace one tag (code in javascript).

var becomes = "<iframe src='http://example.com/thing'/>".replace(/<(\w*) (.*)\//,'<$1 $2></$1')

The same, in Java.

String becomes = "<iframe src=\"http://example.com/thing\"/>".replaceFirst("<(\\w*) (.*)\\/", "<$1 $2></$1");

Upvotes: 1

Related Questions