Wolfgang
Wolfgang

Reputation: 221

Sparql variable outside of a block not bound inside a block

In a Sparql query against a Sesame in-memory store, I would like to separate my query conditions, e.g. "Patient is male" from the rest of my query that produces the query result.

Let's consider this simple query:

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX nci:<http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#>
PREFIX roo: <http://www.cancerdata.org/roo/>

SELECT DISTINCT *
WHERE { 
  ?Patient a nci:C16960 . 
  ?Patient roo:100018 ?gender .
  ?gender a nci:C20197 .
  BIND(bound(?Patient) as ?bound0) .
  { 
    BIND(bound(?Patient) as ?bound1) .
  } 
  UNION { 
    ?Patient a nci:C16960 . 
    BIND(bound(?Patient) as ?bound2) .
  } 
} 

Bound0 is always true (as expected). Bound1 is always false and Bound2 has no value in the result set at all (with and without the extra ?Patient triple).

Does that mean that the ?Patient variable is known in the first block, but not bound and that it does not exist at all in the 2nd block (even though I am declaring it again)? Why would there be a difference?

My goal is to define complex query conditions first and reduce the result set to only those patients that match all conditions. Then use the already reduced ?Patient to gather all the data that is requested in the output.

Thanks in advance for any help!

EDIT

To illustrate my intent further, here is a more complex query. I could write the same query with OPTIONAL instead of UNION. But when I use OPTIONAL, the number of rows in the result set keeps growing fast the more hits each OPTIONAL block returns. If both blocks return multiple hits, then the result contains hits1*hits2 records. I thought I could make this more predictable and keep it within hits1+hits2 by using UNION at a high level. Is there a significant performance difference?

SELECT DISTINCT ?Patient ?Gender ?Neoplastic_Process ?MStage ?TStage
WHERE { 
 ?Patient a nci:C16960 . 
     { 
     ?Patient roo:100018 ?Gender . 
     ?Gender a ?GenderType . 
    } 
    UNION { -- SHOULD I USE OPTIONAL HERE INSTEAD (And for the block above?)
     ?Patient roo:100008 ?Neoplastic_Process . 
     ?Neoplastic_Process a ?Neoplastic_ProcessType . 
        OPTIONAL { 
         ?Neoplastic_Process roo:100241 ?MStage . 
         ?MStage a ?MStageType . 
        } 
        OPTIONAL { 
         ?Neoplastic_Process roo:100244 ?TStage . 
         ?TStage a ?TStageType . 
        } 
    } 
}

I suppose when I use OPTIONAL to add to the result set instead of UNION, my problem with separating the query conditions does not come into play since it is all part of the same block.

Upvotes: 1

Views: 556

Answers (2)

AndyS
AndyS

Reputation: 16630

Answer after question edited.

Look at this part:

    OPTIONAL { 
     ?Neoplastic_Process roo:100241 ?MStage . 
     ?MStage a ?MStageType . 
    } 
    OPTIONAL { 
     ?Neoplastic_Process roo:100244 ?TStage . 
     ?TStage a ?TStageType . 
    }

This will yield the cross product of all the matching ?MStage/?MStageType with ?TStage/?TStageType.

I don't know the shape of your data but maybe this simplification will be a step on the way: you can take out all things not in the SELECT and simplify:

SELECT ?Patient ?Gender ?Neoplastic_Process ?MStage ?TStage WHERE { ?Patient a nci:C16960 . ?Patient roo:100018 ?Gender . ?Patient roo:100008 ?Neoplastic_Process . OPTIONAL { ?Neoplastic_Process roo:100241 ?MStage . } OPTIONAL { ?Neoplastic_Process roo:100244 ?TStage . } } Depending on what the data looks like, and whether you need two different variables for ?MStage and ?TStage, other things might be possible.

Upvotes: 1

AndyS
AndyS

Reputation: 16630

SPARQL execution is defined to be functional, ie. bottom up. In the same way that (1+2) * 3 means calculate 1+2, then do * or double(1+2) means calculate 1+2 and pass 3, not 1+2 to double.

{ 
    BIND(bound(?Patient) as ?bound1) .
}

is calculated then combined (with a join) with

?Patient a nci:C16960 . 
?Patient roo:100018 ?gender .
?gender a nci:C20197 .

So

{ 
    BIND(bound(?Patient) as ?bound1) .
}

has unbound ?Patient

Engines do analyze the query and execute internally in a different way but only if it gets the same results.

It may be easier to repeat smaller patterns in the different parts of a UNION. For larger patterns, to avoid repeating consider sub-SELECTs or OPTIONAL to extend results. (The example in the question is looks like a simplification so it's hard to tell the best way.)

Upvotes: 5

Related Questions