Greg Cox
Greg Cox

Reputation: 287

Why does SPARQL LIMIT on Inner Query seem to limit the outer query under RDF4J?

I'm using RDF4J version 2.2.2 workbench and server under Windows 10. I need to use an inner query to limit results. My particular application is in accumulating an event concept from a series of correlated reports. Each report has a time stamp as one of its properties. I need the inner query with a LIMIT and an ORDER BY to get the latest time stamp from the reports contributing each event. The event is established by a triple in the outer query. Since the full application is rather complex, I've come up with a simple case to illustrate my question. Basically, I'm expecting the outer query to produce several results with an inner query limited to 1, but the LIMIT seems to be applied to the outer query. I'm wondering why I'm getting only one result when I expect more. In the example case herein, I expect two results but get only one...

The example case is set up in RDF4J workbench using an repository with RDFS+SPIN support.

  1. Clear the repository (RDF4J workbench Modify/Clear).
  2. Load the Nuvio ontology version 1.0.0 using the workbench Modify/Add function
  3. Set up the test condition using the following SPARQL update query using the RDF4J Modify/SPARQL Update function.

    PREFIX Nuvio: <http://cogradio.org/ont/Nuvio.owl#>
    PREFIX inst: <http://www.disa.mil/dso/a2i/ontologies/PBSM/Sharing/Instantiations#>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    
    INSERT {
        inst:aTestObject1 Nuvio:hasValue _:b0 ;
            Nuvio:hasValue _:b1 .
        inst:aTestObject2 Nuvio:hasValue _:b2 ;
            Nuvio:hasValue _:b3 .
    
        _:b0 Nuvio:hasDataValue "2017-11-13T13:46:00.000-06:00"^^xsd:dateTime .
        _:b1 Nuvio:hasDataValue "2017-11-13T13:46:01.000-06:00"^^xsd:dateTime .
        _:b2 Nuvio:hasDataValue "2017-11-13T13:46:02.000-06:00"^^xsd:dateTime .
        _:b3 Nuvio:hasDataValue "2017-11-13T13:46:03.000-06:00"^^xsd:dateTime .
    }
    WHERE {
    }
    
  4. Now run the following SPARQL Query using the RDF4J Workbench Explore/Query function:

    PREFIX Nuvio: <http://cogradio.org/ont/Nuvio.owl#>
    
    SELECT DISTINCT *
    WHERE {
      ?o a Nuvio:Quantity .
      ?o Nuvio:hasValue/Nuvio:hasDataValue ?value .
    }
    

    which produces the expected four results (both time stamps for both test individuals): Results as expected from first query

  5. Now attempt to limit the results to one time stamp per individual (inst:aTestObject1 and inst:aTestObject2) using the following query containing an inner query (a simple extension of the first query):

    PREFIX Nuvio: <http://cogradio.org/ont/Nuvio.owl#>
    
    SELECT DISTINCT *
    WHERE {
      ?o a Nuvio:Quantity .
      {
        SELECT DISTINCT *
        WHERE {
          ?o Nuvio:hasValue/Nuvio:hasDataValue ?value .
        } LIMIT 1
      }
    }
    

    which produces only one result:

    unexpected single result from second query when two results expected

    I'm expecting two results, one for each of inst:aTestObject1 and inst:aTestObject2 since each has two timestamps. But I only get a result for inst:aTestObject2. Why only one?

Upvotes: 1

Views: 127

Answers (1)

Greg Cox
Greg Cox

Reputation: 287

Following @AKSW's comment that the inner select is always evaluated first and recovering from my brain fade, the solution to what I wanted (each test object with the latest xsd:dateTime value) can be achieved using the following simple query...

PREFIX Nuvio: <http://cogradio.org/ont/Nuvio.owl#>

SELECT DISTINCT ?o (MAX(?value) as ?maxValue)
WHERE {
  ?o a Nuvio:Quantity .
  ?o Nuvio:hasValue/Nuvio:hasDataValue ?value .
} GROUP BY ?o

Which returns the desired two results:

desired query results

Thanks @AKSW.

Upvotes: 2

Related Questions