Reputation: 305
I want to get a string value from a script with jsoup from a html page. But there are some problems:
here you can see wanted script:
<script type="text/javascript">window._sharedData={
"entry_data": {
"PostPage": [
{
"media": {
"key": "This is the key and i wanna catch it!!!",
},
}
]
},
};</script>
I have tried many ways, but I wasn't successful.
I'm looking forwrd to get the answer, so pls don't let me down!
Upvotes: 1
Views: 1668
Reputation: 11712
JSoup will only help you to get the contents of the script tag as a string. It parses HTML, not script content which is JavaScript. Since in your case the contents of the script is a simple object in JSON notation you could employ a JSON parser after you get the script string and stripping off the variable assignment. IN the below code I use the JSON simple parser.
String html = "<script></script><script></script><script></script>"
+"<script type=\"text/javascript\">window._sharedData={"
+" \"entry_data\": {"
+" \"PostPage\": ["
+" {"
+" \"media\": {"
+" \"key\": \"This is the key and i wanna catch it!!!\","
+" },"
+" }"
+" ]"
+" },"
+"};</script><script></script>";
Document doc = Jsoup.parse(html);
//get the 4th script
Element scriptEl = doc.select("script").get(3);
String scriptContentStr = scriptEl.html();
//clean to get json
String jsonStr = scriptContentStr
.replaceFirst("^.*=\\{", "{") //clean beginning
.replaceFirst("\\;$", ""); //clean end
JSONObject jo = (JSONObject) JSONValue.parse(jsonStr);
JSONArray postPageJA = ((JSONArray)((JSONObject)jo.get("entry_data")).get("PostPage"));
JSONObject mediaJO = (JSONObject) postPageJA.get(0);
JSONObject keyJO = (JSONObject) mediaJO.get("media");
String keyStr = (String) keyJO.get("key");
System.out.println("keyStr = "+keyStr);
This is a bit complicated, and also it depends on your knowledge about the structure of the JSON. A much simpler way may be to use regular expressions:
Pattern p = Pattern.compile(
"media[\":\\s\\{]+key[\":\\s\\{]+\"([^\"]+)\"",
Pattern.DOTALL);
Matcher m = p.matcher(html);
if (m.find()){
String keyFromRE = m.group(1);
System.out.println("keyStr (via RegEx) = "+keyFromRE);
}
Upvotes: 4