am0wa
am0wa

Reputation: 8377

How to retrieve the text in html CDATA section?

I have the following script element section in HTML:

<script type="text/x-markdown"><![CDATA[
# hello, This is Markdown Script Demo]]></script>

When i'm trying to retrieve the inner content via scripttag.innerHTML, it returns the text with ![CDATA[...]]>parts

Is there more efficient way to retrieve the inner part of CDATA section at once instead of applying regexp to remove it from received innerHTML data?

Upvotes: 2

Views: 1888

Answers (3)

Meek
Meek

Reputation: 3348

This question is quite old, but this might help somebody.

You can probably use textContent.

Example from parsing a rss feed node which looks like this:

<title><![CDATA[This contains the title]]></title>

Javascript:

const desc = el.querySelector('title').textContent;

Upvotes: 1

user663031
user663031

Reputation:

CDATA is an XML concept. It is a way of specifying a section of text inside which things that look like mark-up or special XML characters are treated as plain text. It is essentially equivalent to escaping < to &lt; etc. everywhere within the CDATA section.

If the document has an HTML doctype, then the CDATA receives no special processing and is just more characters. If the document had an XHTML doctype, then you would be able to retrieve the CDATA section as is, with no further ado.

Upvotes: 0

taxicala
taxicala

Reputation: 21759

I don't think you will be able to retreive only whats inside the CDATA as its not a tag but plain text, when you get the innerHTML of the tag you will get everything as a string, so regexp is the only way I see you could get whats inside.

Upvotes: 1

Related Questions