lkallas
lkallas

Reputation: 1406

Getting RDF element's attribute value from XML file using PHP

I am trying to get attribute 'rdf:resource' value from 'rdf:li' element from this XML: http://www.ecb.europa.eu/rss/fxref-usd.html

What is the correct way to achieve that? How could one parse these RDF elements correctly?

This is what I have so far:

<!DOCTYPE html>
<html>

    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>RDF</title>
    </head>

    <body>

    <ul> 
 <?php      

            $rdf = file_get_contents('http://www.ecb.europa.eu/rss/fxref-usd.html');


            $rdf = str_replace('rdf:', 'rdf_', $rdf);


            $xml = simplexml_load_string($rdf);


            foreach ($xml->channel->items->rdf_Seq->rdf_li as $item) {
                $attributes = $item->attributes();              

                if(isset($attributes['rdf_resource'])) {
                    echo '<li><a href ='.$attributes['rdf_resource'].' target="_blank">'.$attributes['rdf_resource'].'</a> <l/i>';
                }
            }
?>
    </ul>

    </body>

</html>

As you can see this is kind of a hack and I believe it is not the correct way.

Any help is appreciated!

Upvotes: 0

Views: 638

Answers (1)

Joshua Taylor
Joshua Taylor

Reputation: 85853

I am trying to get attribute 'rdf:resource' value from 'rdf:li' element from this XML: http://www.ecb.europa.eu/rss/fxref-usd.html

First, that's not actually legal RDF, at least according to Jena's parser. After removing the xsd schema location, which evidently isn't allowed on the rdf:RDF element, I still get an error: Expecting XML start or end element(s). String data "U2" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.

But even if it were legal RDF/XML, there are two issues with your approach that will end up being kind of brittle. The first is that it's very difficult to reliably process RDF/XML with XML tools, as explained in this answer that I wrote to How to access OWL documents using XPath in Java?. In general, the same RDF graph can serialized as a bunch of different RDF/XML documents. For working with rdf:li, this is especially important: the RDF graph doesn't actually have any resources with rdf:li properties, even though there are rdf:li elements in the XML document. Have a look at:

2.15 Container Membership Property Elements: rdf:li and rdf:_n

RDF has a set of container membership properties and corresponding property elements that are mostly used with instances of the rdf:Seq, rdf:Bag and rdf:Alt classes which may be written as typed node elements. The list properties are rdf:_1, rdf:_2 etc. and can be written as property elements or property attributes as shown in Example 17. There is an rdf:li special property element that is equivalent to rdf:_1, rdf:_2 in order, explained in detail in section 7.4. The mapping to the container membership properties is always done in the order that the rdf:li special property elements appear in XML — the document order is significant. The equivalent RDF/XML to Example 17 written in this form is shown in Example 18.

That means that an RDF/XML snippet (not quite legal, but gives the general impression) like:

<ex:Collection>
  <rdf:li rdf:about="member1"/>
  <rdf:li rdf:about="member2"/>
</ex:Collection>

could also be written as:

<ex:Collection>
  <rdf:_2 rdf:about="member2"/>
  <rdf:_1 rdf:about="member1"/>
</ex:Collection>

That means that any purely XML based approach here is probably going to brittle, because it will depend on some structure that isn't guaranteed to always be represented the same way.

Usually the answer is to query with an RDF query language, so that you can query at the RDF level. The standard RDF query language is SPARQL. Unfortunately, since there are literally infinitely many properties (rdf:_1, rdf:_2, …), it's hard to do this efficiently in SPARQL, too, since you end up needing to match URIs that look like rdf:_xxx and then figure out what comes after that underscore.

OK, so if you can get the RDF/XML into a legal format, you might end up with something like:

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:cb="http://www.cbwiki.net/wiki/index.php/Specification_1.1" xmlns:dc = "http://purl.org/dc/elements/1.1/" xmlns:dcterms = "http://purl.org/dc/terms/" xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsi = "http://www.w3.org/2001/XMLSchema-instance">
<channel  rdf:about = "http://www.ecb.europa.eu/rss/usd.html">
<title>ECB | US dollar (USD) - Euro foreign exchange reference rates</title>  
<link>http://www.ecb.europa.eu/home/html/rss.en.html</link>
<description>The reference rates are based on the regular daily concertation procedure between central banks within and outside the European System of Central Banks, which normally takes place at 2.15 p.m. (14:15) ECB time.</description>
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-09&amp;rate=1.1362" />
<rdf:li rdf:resource="http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-08&amp;rate=1.1254" />
<rdf:li rdf:resource="http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-07&amp;rate=1.1266" />
<rdf:li rdf:resource="http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-06&amp;rate=1.1224" />
<rdf:li rdf:resource="http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-05&amp;rate=1.1236" />
</rdf:Seq>
</items>
</channel>
</rdf:RDF>

Now, remember, those rdf:li XML elements doesn't mean that there are rdf:li properties in the graph, instead there are a bunch of rdf:_n properties. In the Turtle serialization (which is similar to SPARQL syntax), the data is:

@prefix :      <http://purl.org/rss/1.0/> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix cb:    <http://www.cbwiki.net/wiki/index.php/Specification_1.1> .
@prefix dc:    <http://purl.org/dc/elements/1.1/> .
@prefix xsi:   <http://www.w3.org/2001/XMLSchema-instance> .

<http://www.ecb.europa.eu/rss/usd.html>
        a             :channel ;
        :description  "The reference rates are based on the regular daily concertation procedure between central banks within and outside the European System of Central Banks, which normally takes place at 2.15 p.m. (14:15) ECB time." ;
        :items        [ a       rdf:Seq ;
                        rdf:_1  <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-09&rate=1.1362> ;
                        rdf:_2  <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-08&rate=1.1254> ;
                        rdf:_3  <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-07&rate=1.1266> ;
                        rdf:_4  <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-06&rate=1.1224> ;
                        rdf:_5  <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-05&rate=1.1236>
                      ] ;
        :link         "http://www.ecb.europa.eu/home/html/rss.en.html" ;
        :title        "ECB | US dollar (USD) - Euro foreign exchange reference rates" .

What I would do at this is to look for the :items property of your channel, check that it's an rdf:Seq, and then either take all of its properties except rdf:type, and just assume that they're rdf:_n values, or actually get the rdf:_xxx property values. That would look like:

prefix :      <http://purl.org/rss/1.0/>
prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?item {
  <http://www.ecb.europa.eu/rss/usd.html> :items ?x .
  ?x a rdf:Seq .
  ?x ?p ?item .
  filter (?p != rdf:type)
}
--------------------------------------------------------------------------------------------------------------------
| item                                                                                                             |
====================================================================================================================
| <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-05&rate=1.1236> |
| <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-06&rate=1.1224> |
| <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-07&rate=1.1266> |
| <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-08&rate=1.1254> |
| <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-09&rate=1.1362> |
--------------------------------------------------------------------------------------------------------------------

Or, the latter approach (actually checking for rdf:_):

prefix :      <http://purl.org/rss/1.0/>
prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix xsd:   <http://www.w3.org/2001/XMLSchema#>

select ?n ?item {
  <http://www.ecb.europa.eu/rss/usd.html> :items ?x .
  ?x a rdf:Seq .
  ?x ?p ?item .

  # check that ?p starts with rdf:_
  filter strstarts(str(?p),str(rdf:_))

  # and extract the part after rdf:_ and convert
  # it to an integer
  bind (xsd:integer(strafter(str(?p),str(rdf:_))) as ?n)
}
------------------------------------------------------------------------------------------------------------------------
| n | item                                                                                                             |
========================================================================================================================
| 5 | <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-05&rate=1.1236> |
| 4 | <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-06&rate=1.1224> |
| 3 | <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-07&rate=1.1266> |
| 2 | <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-08&rate=1.1254> |
| 1 | <http://www.ecb.europa.eu/stats/exchange/eurofxref/html/eurofxref-graph-usd.en.html?date=2015-10-09&rate=1.1362> |
------------------------------------------------------------------------------------------------------------------------

Now you just need a SPARQL library for PHP. I'm not really a PHP user, so I can't recommend one, but I know that there are some other questions on Stack Overflow about PHP and SPARQL, and that there are some libraries out there.

Upvotes: 1

Related Questions