Garbit
Garbit

Reputation: 6046

YQL - CDATA ]]> error when selecting data using YQL

Trying to scrape data from totalfilm.com using YQL but I'm getting a strange error:

"The character sequence "]]>" must not appear in content unless used to mark the end of a CDATA section."

select * from html where url="www.totalfilm.com"

link

Upvotes: 0

Views: 461

Answers (1)

salathe
salathe

Reputation: 51950

As commented, some fudging may need to occur to get the broken XHTML working as you would like.

Here is a quick, very crude open data table for you which strips any <![CDATA[ and ]]> from an (X)HTML page (and also Tidys it), before applying an optional XPath expression, as in the normal html table, to get at the data you need.

You can use it like:

use "https://github.com/salathe/yql-tables/raw/examples/data/nocdata.xml" as html;
select * from html where url="www.totalfilm.com"

Upvotes: 2

Related Questions