Praxder
Praxder

Reputation: 2749

Jsoup get contents of javascript that has CDATA tags?

I am using Jsoup to parse a webpage. But some if the info that I want to parse is inside a CDATA tag that prevents the parser from extracting the data inside. How would I go about extracting data from within a CDATA tag? EXAMPLE:

<script type='text/javascript'><!--// <![CDATA[
    OA_show('300x250');
// ]]> --></script>
         <script type='text/javascript'>alert("Hello");</script>

If i use Jsoup to parse this page and try selecting all tha matching elements in the page with "script[type=text/javascript]" I get returned the contents of other scripts in the page that do not have CDATA tags but not the Alert("Hello"); value. How would I go about getting that a value inside a CDATA tag with Jsoup?

Thanks!

Upvotes: 5

Views: 3744

Answers (1)

MariuszS
MariuszS

Reputation: 31547

String page = "<script type='text/javascript'><!--// <![CDATA[\n" +
        "    OA_show('300x250');\n" +
        "// ]]> --></script>\n" +
        "         <script type='text/javascript'>alert(\"Hello\");</script>";

String html = Jsoup.parse(page).select("script").get(0).html();
html = html.replace("<!--// <![CDATA[", "");
html = html.replace("// ]]> -->", "");

System.out.println(html);

Result

OA_show('300x250');

Upvotes: 3

Related Questions