Reputation: 6328
I am trying to parse a web page, which contains some JS. Till now I am using Jsoup
to parse html in Java, which is working as expected. But I am unable to parse the JavaScript. Below is the snippet of the HTML page-
<script type="text/javascript">
var element = document.createElement("input");
element.setAttribute("type", "hidden");
element.setAttribute("value", "");
element.setAttribute("name", "AzPwXPs");
element.setAttribute("id", "AzPwXPs");
var foo = document.getElementById("dnipb");
foo.appendChild(element);
var element1 = document.createElement("input");
element1.setAttribute("type", "hidden");
element1.setAttribute("value", "6D6AB8AECC9B28235F1DE39D879537E1");
element1.setAttribute("name", "ZLZWNK");
element1.setAttribute("id", "ZLZWNK");
foo.appendChild(element1);
</script>
I want to read both the values with their name
/id
. So that after parsing I can get following results-
AzPwXPs=
ZLZWNK=6D6AB8AECC9B28235F1DE39D879537E1
How to parse in this situation?
Upvotes: 4
Views: 13099
Reputation: 347
I already had the same situation to find url's in css files.
Put the javascript in a string and a apply Regular expressions
Pattern p = Pattern.compile("url\\(\\s*(['" + '"' + "]?+)(.*?)\\1\\s*\\)"); //expression
Matcher m = p.matcher(content);
while (m.find()) {
String urlFound = m.group();
}
Regards, Hugo Pedrosa
Upvotes: 2
Reputation: 508
I have stumbled upon this question few times when searching for the solution to parse pages with JavaScript but the solution provided is not perfect. I have found pure Java solution to the problem by using JBrowserDriver and JSoup to parse JavaScript manipulated page.
Simple example:
// JBrowserDriver part
JBrowserDriver driver = new JBrowserDriver(Settings
.builder().
timezone(Timezone.EUROPE_ATHENS).build());
driver.get(FETCH_URL);
String loadedPage = driver.getPageSource();
// JSoup parsing part
Document document = Jsoup.parse(loadedPage);
Elements elements = document.select("#nav-console span.data");
log.info("Found element count: {}", elements.size());
driver.quit();
Upvotes: 6
Reputation: 6476
Selenium's Webdriver is fantastic: http://docs.seleniumhq.org/docs/03_webdriver.jsp
See this answer for an example of what you are trying to do: Using Selenium Web Driver to retrieve value of a HTML input
Upvotes: 1
Reputation: 120516
Once you've got the text content of the <script>
element from JSoup, you can parse the JS using the Caja JS parser and then walk the parse tree to find what you're looking for.
Upvotes: 1