Reputation: 60778
This HTML:
<td height="79" valign="top" width="70">
<a href="http://e.livinghuntington.com/HS?a=stuff" target="_blank" title="Follow us on Twitter: http://twitter.com/#!/HuntingtonLive"> link link link <img alt="Follow us on Twitter: http://twitter.com/#!/HuntingtonLive" border="0" height="79" src="http://webe.emv3.com/livinghuntington/images/tt.png" style="display:block;" width="70"/></a>
</td>
</table>
<table>
and this code:
public void handleStartTag(Tag tag, MutableAttributeSet attr, int pos) {
System.err.println("tag = " + tag);
Gives this output:
tag = td
tag = a
tag = table
I tried various testing strategies: if I nest a link (which I don't even know if is valid html) it correctly picks up the inner link. If I pull the image out of the link it still doesn't pick up the img. As far as I can tell it never picks up image tags at all. Is there an error in code or a kludge or is this an irreparable problem with the HTML Parser (so I need to chuck it and use a new one)?
Upvotes: 1
Views: 302
Reputation: 60778
Issue was img is simple tag so is not picked up under startTag(). handleSimpleTag() is the handler to use.
Upvotes: 2