Reputation: 392
<html>
<head></head>
<frameset cols="180,590,*" border="0">
<frame src="test.html" name="main" noresize="" scrolling="no" marginwidth="0" marginheight="0">
<frame src="http://www.test.com/my.php" name="right" noresize="" scrolling="auto" marginwidth="0" marginheight="0">
#document <!-- what is this? -->
<html>
<head>
<title>TEST</title>
</head>
<body></body>
</html>
</frame>
</frameset>
</html>
I'm parsing a webpage. But I have a problem with it.
What is the #documnet
?
And how can I parse <html>
below #document
using Jsoup
?
Upvotes: 4
Views: 1485
Reputation: 43013
And how can I parse below #document using Jsoup?
You can see #document
as a "virtual" element. Jsoup won't see it. It is not present in the actual HTML code neither.
What you want is fetching the frames with Jsoup. See below:
Document doc = ...; // HTML page containing the frameset
Document mainFrameDocument = Jsoup.connect(doc.select("frame[name=main]").absUrl("src")).get();
Document rightFrameDocument = Jsoup.connect(doc.select("frame[name=right]").absUrl("src")).get();
Upvotes: 3