Reputation: 871
I am currently using Jsoup parser to take in some HTML and then change some titles for specific tags. The problem is that my html seems to be altered when it is put through the jsoup parser for some reason. Is there a way to tell jsoup to not append any html.body tags, or to not add missing tags?
It seems to be altering my tables.
Orig
<div class="mstrPanelPortrait">
<table cellpadding="0" class="pane" cellspacing="0">
<tr>
<td>
<table class="pane" cellspacing="0">
<tr>
<td class="mstrPanelBody" sty="body">
<div>
<div class="mstrBrowser">
After put through jsoup
<div class="mstrPanelPortrait" title="darrensTest">
<table cellpadding="0" class="pane" cellspacing="0">
<tbody>
<tr>
<td>
<table class="pane" cellspacing="0">
<tbody>
<tr>
<td class="mstrPanelBody" sty="body">
You can see tbody was added in a few places. Not sure why
entire html
<div or="2" class="mstrTransform" cx="[0,1,2,3,4]" id="FolderObjectBrowser_display" ty="editor" cxid="FolderObjectBrowser_display_cmm" rsz="0" dg="0" iframe="true" style="display:block;" name="FolderObjectBrowser_display" scriptclass="mstrReportAllObjectsImpl" ors="3">
<form id="FolderObjectBrowser_display_form" name="FolderObjectBrowser_display_form" target="frameManager" action="mstrWeb" method="post" onsubmit="appendPageState(this);">
<input id="iframe" name="iframe" value="true" class="mstrHiddenInput" type="hidden"/>
<input name="evt" value="5005" class="mstrHiddenInput" type="hidden"/>
<input name="src" value="mstrWeb.report.5005" class="mstrHiddenInput" type="hidden"/>
<div class="mstrPanelPortrait">
<table cellpadding="0" class="pane" cellspacing="0">
<tr>
<td>
<table class="pane" cellspacing="0">
<tr>
<td class="mstrPanelBody" sty="body">
<div>
<div class="mstrBrowser">
<div id="folerBoxContainerID" class="folerBoxContainer">
<select name="oeFolderID" class="mstrAncestors" sty="folderList">
<option selected="1" title="MicroStrategy Tutorial" level="0" value="D43364C684E34A5F9B2F9AD7108F7828">MicroStrategy Tutorial</option>
<option islink="true" title="Data Explorer" level="0" value="37ED6C6202E14C3181F1F4A043A1CAA8">Data Explorer</option>
<option islink="true" title="My Personal Objects" level="0" value="8D67908E11D3E4981000E787EC6DE8A4">My Personal Objects</option>
<option islink="true" title="Attributes" level="0" value="6F55FB47F9974EABA18CB0C5FF46785C">Attributes</option>
<option islink="true" title="Metrics" level="0" value="E0CCB9CF22104A489CBE78D974AFD19E">Metrics</option>
<option islink="true" title="Hierarchies" level="0" value="C2A0BB1ACAAD45A18B8CA8AECF0A35EE">Hierarchies</option>
</select>
<a><img id="upFolder" title="Up One Level" alt="Up One Level" name="upFolder" class="mstrIcon-btn mstrIcon-btnUpFolderDisabled" src="../images/1ptrans.gif"/></a><a target="frameManager" class="mstrLink" onclick="return submitLink(this, event);" href="mstrWeb?iframe=true&evt=83005&src=mstrWeb.report.frame.accordion.tbObjBrwsr.pbt.83005"><img id="changeFormat" title="Tree" alt="Tree" name="changeFormat" class="mstrIcon-btn mstrIcon-btnChangeDisplayFormatTree" src="../images/1ptrans.gif"/></a>
</div>
<div class="mstrSearchDiv"><span id="name_label">Find:</span><input id="searchArg" name="name" value="" class="mstrInputText" onkeydown="return microstrategy.bone('FolderObjectBrowser_display').checkForFormSubmit(arguments[0]);" type="text"/><input id="search" title="Find" alt="Find" name="98002" class="mstrIcon-btn mstrIcon-btnFind" src="../images/1ptrans.gif" border="0" type="image"/></div>
<div style="position:relative">
<div sty="fileList">
<div id="list" class="mstrSmallIconView">
<div title="Folder: Project Builder; Folder for all the objects created by Project Builder" dss_ty="8"><span class="mstrIcon-lv-f mstrIcon-lv"><span></span></span><a title="Folder for all the objects created by Project Builder" target="frameManager" class="mstrLink" onclick="return submitLink(this, event);" href="mstrWeb?iframe=true&evt=98001&src=mstrWeb.report.frame.accordion.tbObjBrwsr.pbt.FolderObjectBrowser.98001&oeFolderBlockBegin=1&oeFolderID=42EEDD41A6954F7485453C170AA3F8BE">Project Builder</a></div>
<div title="Folder: Project Objects" dss_ty="8"><span class="mstrIcon-lv-f mstrIcon-lv"><span></span></span><a title="" target="frameManager" class="mstrLink" onclick="return submitLink(this, event);" href="mstrWeb?iframe=true&evt=98001&src=mstrWeb.report.frame.accordion.tbObjBrwsr.pbt.FolderObjectBrowser.98001&oeFolderBlockBegin=1&oeFolderID=02C37D85EE25483AA5708E2BFE858B92">Project Objects</a></div>
<div title="Folder: Public Objects; Folder for all public objects" dss_ty="8"><span class="mstrIcon-lv-f mstrIcon-lv"><span></span></span><a title="Folder for all public objects" target="frameManager" class="mstrLink" onclick="return submitLink(this, event);" href="mstrWeb?iframe=true&evt=98001&src=mstrWeb.report.frame.accordion.tbObjBrwsr.pbt.FolderObjectBrowser.98001&oeFolderBlockBegin=1&oeFolderID=98FE182C2A10427EACE0CD30B6768258">Public Objects</a></div>
<div title="Folder: Schema Objects; Folder for all schema objects" dss_ty="8"><span class="mstrIcon-lv-f mstrIcon-lv"><span></span></span><a title="Folder for all schema objects" target="frameManager" class="mstrLink" onclick="return submitLink(this, event);" href="mstrWeb?iframe=true&evt=98001&src=mstrWeb.report.frame.accordion.tbObjBrwsr.pbt.FolderObjectBrowser.98001&oeFolderBlockBegin=1&oeFolderID=95C3B713318B43D490EE789BE27D298C">Schema Objects</a></div>
<div title="Folder: Data Explorer; Hierarchy groups folder" dss_ty="8"><span class="mstrIcon-lv mstrIcon-lv-fh"><span class="sc"></span></span><a title="Hierarchy groups folder" target="frameManager" class="mstrLink" onclick="return submitLink(this, event);" href="mstrWeb?iframe=true&evt=98001&src=mstrWeb.report.frame.accordion.tbObjBrwsr.pbt.FolderObjectBrowser.98001&oeFolderBlockBegin=1&oeFolderID=37ED6C6202E14C3181F1F4A043A1CAA8">Data Explorer</a></div>
<div title="Folder: My Personal Objects" dss_ty="8"><span class="mstrIcon-lv mstrIcon-lv-fmo"><span class="sc"></span></span><a title="" target="frameManager" class="mstrLink" onclick="return submitLink(this, event);" href="mstrWeb?iframe=true&evt=98001&src=mstrWeb.report.frame.accordion.tbObjBrwsr.pbt.FolderObjectBrowser.98001&oeFolderBlockBegin=1&oeFolderID=8D67908E11D3E4981000E787EC6DE8A4">My Personal Objects</a></div>
<div title="Folder: Attributes" dss_ty="8"><span class="mstrIcon-lv mstrIcon-lv-fa"><span class="sc"></span></span><a title="" target="frameManager" class="mstrLink" onclick="return submitLink(this, event);" href="mstrWeb?iframe=true&evt=98001&src=mstrWeb.report.frame.accordion.tbObjBrwsr.pbt.FolderObjectBrowser.98001&oeFolderBlockBegin=1&oeFolderID=6F55FB47F9974EABA18CB0C5FF46785C">Attributes</a></div>
<div title="Folder: Metrics" dss_ty="8"><span class="mstrIcon-lv mstrIcon-lv-fm"><span class="sc"></span></span><a title="" target="frameManager" class="mstrLink" onclick="return submitLink(this, event);" href="mstrWeb?iframe=true&evt=98001&src=mstrWeb.report.frame.accordion.tbObjBrwsr.pbt.FolderObjectBrowser.98001&oeFolderBlockBegin=1&oeFolderID=E0CCB9CF22104A489CBE78D974AFD19E">Metrics</a></div>
<div title="Folder: Hierarchies" dss_ty="8"><span class="mstrIcon-lv mstrIcon-lv-fh"><span class="sc"></span></span><a title="" target="frameManager" class="mstrLink" onclick="return submitLink(this, event);" href="mstrWeb?iframe=true&evt=98001&src=mstrWeb.report.frame.accordion.tbObjBrwsr.pbt.FolderObjectBrowser.98001&oeFolderBlockBegin=1&oeFolderID=C2A0BB1ACAAD45A18B8CA8AECF0A35EE">Hierarchies</a></div>
</div>
</div>
</div>
<table id="FolderObjectBrowser_display_oCount" width="100%" name="FolderObjectBrowser_display_oCount" cellpadding="0" border="0" cellspacing="2">
<tr>
<td align="LEFT"> 4 item(s) found</td>
</tr>
</table>
</div>
<input name="evt" value="98001" class="mstrHiddenInput" type="hidden"/><input name="src" value="mstrWeb.report.frame.accordion.tbObjBrwsr.pbt.FolderObjectBrowser.98001" class="mstrHiddenInput" type="hidden"/><input name="evt" value="98002" class="mstrHiddenInput" type="hidden"/><input name="src" value="mstrWeb.report.frame.accordion.tbObjBrwsr.pbt.FolderObjectBrowser.98002" class="mstrHiddenInput" type="hidden"/><input id="evtorder" name="evtorder" value="98001,98002" class="mstrHiddenInput" type="hidden"/>
</div>
</td>
</tr>
</table>
</td>
</tr>
</TABLE>
</div>
<div class="mstrSpaceAfterEditor"><img title="" height="3" alt="" width="1" src="../images/1ptrans.gif" border="0"/></div>
</form>
</div>
Upvotes: 2
Views: 656
Reputation: 8879
As far as I know, there is no way to tell Jsoup to not balance tags.
What you can use instead is the non-default XML parser that wont add any new tags (such as tbody
), but only balance the tags that are not already balanced.
So, is it okay if Jsoup doesn't add any tags, but instead only balances the HTML?
If the answer to that question is yes, then you should use the XML parser instead of the default HTML parser.
doc = Jsoup.parse(html, "", Parser.xmlParser());
This will parse the HTML but add closing tags, though not add tags that aren't already there, thus not changing the structure. You can then select from the document in a normal Jsoup fashion.
<div class="mstrPanelPortrait">
<table cellpadding="0" class="pane" cellspacing="0">
<tr>
<td>
<table class="pane" cellspacing="0">
<tr>
<td class="mstrPanelBody" sty="body">
<div>
<div class="mstrBrowser"></div>
</div></td>
</tr>
</table></td>
</tr>
</table>
</div>
Upvotes: 1