Reputation: 257
I want to Parse a Html and get the result as a string. Given that the Body of the Outer Html contains another Html String, I want that inner Html as output String.
Example> Input HTML:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html><head></head><body><p><!DOCTYPE html><br /><html><br /><body><br /><br /><h1>My First Heading</h1><br /><br /><p>My first paragraph.</p><br /><br /></body><br /></html><br /><br /></p></body></html>
Output String :
<!DOCTYPE html><html><body><h1>My First Heading</h1><p>My first paragraph.</p></body></html>
Important : I am using a HTML editor in which if I input something, it returns the HTML represantation for that Input on doing getText, the first Html String above is that representation only.
Also the output string should be same as when I run the first String here(http://www.w3schools.com/html/tryit.asp?filename=tryhtml_basic)
Please help me with this.
Upvotes: 0
Views: 469
Reputation: 1205
i would go with some regexp :
(<!DOCTYPE html>).*(<html>.*</html>).+
And taking group 1 and group 2,
tst = tst.replaceAll("<", "<").replaceAll(">",">");
Pattern p = Pattern.compile("(<!DOCTYPE html>).*(<html>.*</html>).*</html>.*");
Matcher m = p.matcher(tst);
m.find();
System.out.println(m.group(1) + m.group(2));
exemple runnning : http://rextester.com/JTOJ89529
Upvotes: 1