Reputation: 6046
Trying to scrape data from totalfilm.com using YQL but I'm getting a strange error:
"The character sequence "]]>" must not appear in content unless used to mark the end of a CDATA section."
select * from html where url="www.totalfilm.com"
Upvotes: 0
Views: 461
Reputation: 51950
As commented, some fudging may need to occur to get the broken XHTML working as you would like.
Here is a quick, very crude open data table for you which strips any <![CDATA[
and ]]>
from an (X)HTML page (and also Tidys it), before applying an optional XPath expression, as in the normal html
table, to get at the data you need.
You can use it like:
use "https://github.com/salathe/yql-tables/raw/examples/data/nocdata.xml" as html;
select * from html where url="www.totalfilm.com"
Upvotes: 2