Tim
Tim

Reputation: 959

SPARQL Join two graphs : value precedence?

I have two graphs that contain values for enrollment in clinical trials. Each graph has incomplete records for enrollment so I want to combine the graphs to get a more complete listing of the enrollment values.

The KMD graph should take precedence. If enrollment is present in both the KMD graph and the KCTGOV graph, use the value from KMD. If enrollment is missing in KMD, use the enrollment value from KCTGOV.

I am getting close with the query below: I bring in the enrollment values from each graph by successfully joining on the value of ?nctId. How do I then create a result for ?enrollment that is from KMD when present in that graph or comes from KCTGOV when the value is missing in KMD? The code below creates separate enrollment columns named enrollKMD and enrollKCT. I need a merge of those columns.

Suggestions greatly appreciated!

PREFIX kmd:   <http://www.example.org/kmd/>
PREFIX lct:  <http://data.linkedct.org/vocab/resource/>

SELECT *
FROM NAMED <http://localhost:8890/KMD>
FROM NAMED <http://localhost:8890/KCTGOV>
WHERE
{
    GRAPH <http://localhost:8890/KMD>
    {
        ?obs a kmd:Study ;
               kmd:hasOrgId  ?orgId .
        OPTIONAL
        {
            ?obs kmd:hasNctId  ?nctIdURI .
        }
        OPTIONAL {?obs kmd:hasEnrollment  ?enrollkmd.}
        # Create STR of NCTID for merge
        BIND(strafter(str(?nctIdURI), "kmd/") AS ?nctId )
    }
    OPTIONAL
    {
        GRAPH <http://localhost:8890/KCTGOV>
        {
            OPTIONAL{ ?govNctIdURI lct:enrollment ?enrollKCT.}
            # Create STR of NCTID for merge
            BIND(UCASE(strafter(str(?govNctIdURI), "trial/")) AS ?nctId )
        }  
    }
}ORDER BY ?orgId

Upvotes: 2

Views: 326

Answers (1)

Jeen Broekstra
Jeen Broekstra

Reputation: 22042

You can do this with an IF operation, like so:

select (if(bound(?enrollkmd), ?enrollkmd, ?enrollKCT) as ?enrollment)
where ...

The IF operator checks if ?enrollkmd is bound to a value, if so, it returns that value, otherwise it returns the value of ?enrollKCT. The outcome of the operator is then bound to the ?enrollment variable in your query result.

Of course, since you are no longer using the wildcard-select ('*'), you will now need to explicitly add all variables you want returned. So the full select-clause will become something like this:

select ?obs ?orgId ?nctId (if(bound(?enrollkmd), ?enrollkmd, ?enrollKCT) as ?enrollment)

adapt to taste.

Upvotes: 2

Related Questions