user1927468
user1927468

Reputation: 1133

read RDF/XML from url in Jena

I'm trying to read XML file by using Jena. and normally it is working.

    final String url = "http://www.bbc.co.uk/nature/life/Human";
    Model model = ModelFactory.createDefaultModel();       
    model.read(url, "RDF/XML");

but when I try another URL when the paragraph contains br or a link. it give me this error.

Exception in thread "main" org.apache.jena.riot.RiotException: [line: 25, col: 6 ] {E202} Cannot have both string data "Great white sharks are at the very top of the marine food chain. Feared as man-eaters, they are only responsible for about 5-10 attacks a year, which are rarely fatal. Great whites are ultimate predators. Powerful streamlined bodies and a mouth full of terrifyingly sharp, serrated teeth, combine with super senses that can detect a single drop of blood from over a mile away. Hiding from a great white isn't an option as they can detect and home in on small electrical discharges from hearts and gills. Unlike most other sharks, live young are born that immediately swim away.
" and XML data <br> inside a property element. Maybe you want rdf:parseType='Literal'.

this is the link for the second situation when Jena throw this error http://www.bbc.co.uk/nature/life/Great_white_shark

what I should do to make it ignore that.

Upvotes: 1

Views: 359

Answers (1)

AndyS
AndyS

Reputation: 16680

The problem is in the data at the BBC site; the <br/> needs to be escaped as &lt;br/&gt; to put the HTML markup into the string value. In RDF/XML the string value can not have raw markup for a simple string.

Unfortunately, the BBC site does not handle fully content-negotiation: asking for Turtle or N-triples gets an XHMTL page.

You will need to download the file with a regular HTTP request, with header Accept: application/rdf+xml, patch up th content, and parse it from the fixed version. One way to do this is to read it into a Java string, do a regex to replace <br/> with &lt;br/&gt; and then parse from the string.

Upvotes: 2

Related Questions