Reputation: 263
I am attempting to use XMLParse against content that is not valid xhtml. In the html code, the tags are not properly terminated. In order to terminate them, I am using the replace function to find the invalid code and replace it with properly terminated code. In doing so, my application errors out and tells me that the meta tag is invalid:
An error occured while Parsing an XML document.
The element type "meta" must be terminated by the matching end-tag "".
The code I'm trying to validate is:
<html>
<head>
<title>Impart Client Interface</title>
<link href="side_panel.css" rel="stylesheet" type="text/css">
<link href="default.css" rel="stylesheet" type="text/css">
<link href="tabs.css" rel="stylesheet" type="text/css">
<link href="data_tables.css" rel="stylesheet" type="text/css">
<link href="xp_button.css" rel="stylesheet" type="text/css">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
What I've created in CF to attempt to handle this is:
<cfset xml = objResponse.FileContent>
<cfset page.content = '<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">'>
<cfset page.updatedcontent = replace('#page.content#','8859-1','8859-1" />"','')>
<Cfset page.link = 'type="text/css">'>
<cfset page.updatedLink = replace('#page.link#', 'css">', 'css" />', 'all')>
<cfset validXML = replace(#xml#, "#page.content#", "#page.updatedContent#", "")>
<cfset validXML = replace(#xml#, "#page.link#", "#page.UpdatedLink#", "all")>
<cfoutput>#validXML#</cfoutput>
<cfset parsethis = xmlparse(validXML)>
<cfdump var="#parsethis#">
How can I resolve this error?
Upvotes: 0
Views: 954
Reputation: 16945
Looks to me like you are missing part of the substring in your replace call:
<cfset page.updatedcontent = replace(page.content,'8859-1">','8859-1" />')>
Note the addition of ">
So, the above will address your specific technical question. I'd like to suggest a better approach to your general task, however. Doing string manipulation on HTML to try to mash it into proper XHTML is tricky at best (as you have seen). Instead, consider abandoning XMLParse in favor of an actual HTML parser, such as JSOUP. After you download the jar and add it to your CF classpath, you can do things like this:
<cfset jsoup = CreateObject("java", "org.jsoup.Jsoup")>
<cfsavecontent variable="html">
<html>
<body>
<hr>
<pre id="blah">Foo<br>bar1</pre>
<hr>
<pre id="blah2">Foo<br>bar2</pre>
</body>
</html>
</cfsavecontent>
<cfdump var="#jsoup.parse(html).select('pre').first().html()#">
Which will output :
Foo<br />bar1
Pretty spiffy, eh? And no need to pull out your hair over getting exact details right with XML.
Upvotes: 2