Wolfgang
Wolfgang

Reputation: 221

Ontologies, OWL, Sparql: Modelling that "something is not there" and performance considerations

we want to model that "something is not there" as opposed to missing information, e.g. an explicit statement that "a patient did not get chemotherapy" or that "a patient does not have dyspnea" is different from missing information about whether a patient has dyspnea.

We thought about several approaches, e.g.

We feel like the 3rd option is the most "ontologically correct" way of expressing this. However, when playing around with it we encountered severe performance problems in simple scenarios.

We are using Sesame with an OWLIM-Lite store and imported the NCI thesaurus (280MB, about 80,000 concepts) and another very small ontology into the store and added two individuals, one having that complementOf/restriction class.

The following query took forever to execute and I terminated it after 15 minutes:

select *
where {
  ?s a [ rdf:type owl:Class;
                      owl:complementOf [
                        rdf:type owl:Restriction ;
                        owl:onProperty roo:has_finding;
                        owl:someValuesFrom nci:Dyspnea;
                  ]
                ] .
} Limit 100

Does anybody know why? I would assume that this approach creates a lot of blank nodes and the query engine has to go through the entire NCI thesaurus and compare all blank nodes with this one?

If I put this triple in a separate graph and only query that graph, the query returns the result instantaneously.

To sum things up. The two basic questions are:

EDIT 1

We discussed the proposed options. It actually helped us in clarifying what we are really trying to achieve:

We played around with options:

That way, the person writing the Sparql query only has to know that:

We would really appreciate some feedback on these conclusions!

Upvotes: 7

Views: 643

Answers (2)

Jeen Broekstra
Jeen Broekstra

Reputation: 22042

With respect to the modeling question, I'd like to offer a fourth alternative, which is, in fact, a mix of your options 1 and 2: introduce a separate class (hierarchy) for these 'excluded/missing' symptoms, diseases or treatments, and have the specific exclusions as instances:

 :Exclusion a owl:Class .
 :ExcludedSymptom rdfs:subClassOf :Exclusion .
 :ExcludedTreatment rdfs:subClassOf :Exclusion .

 :excludedDyspnea a :ExcludedSymptom .
 :excludedChemo a :ExcludedTreatment .

 :Patient a owl:Class ;
          owl:equivalentClass [ a owl:Restriction ;
                                owl:onProperty :excluded ;
                                owl:allValuesFrom :Exclusion ] .

 // john is a patient without Dyspnea 
 :john a :Patient ;
       :excluded :excludedDyspnea .

Optionally, you can link the exclusion instances semantically with the treatment/symptom/diseases:

  :excludedDyspnea :ofSymptom :Dyspnea . 

In my view, this is just as "ontologically correct" (this kind of thing is quite subjective to be honest) as your other options, and possibly a lot easier to maintain, query, and indeed reason with.

As for your second question: while I can't speak for the behavior of the particular reasoner you're using, in general any construction involving complementOf is computationally very heavy, but perhaps more importantly, it probably does not capture what you intend.

OWL has an open world assumption, which (in broad terms) means that we cannot decide a certain fact is untrue simply because that fact is currently unknown. Your complementOf construction will logically be an empty class, because for any individual X, even if we currently do not know that X has been diagnosed with Dyspnea, there is a possibility that in the future that fact will become known, and therefore X will not be in the complement class.

EDIT

In response to your edit, with the proposal using a single :hasFinding property, I think that generally looks good, though I would perhaps modify it slightly:

   :patient1 a :Patient;
             :hasFinding :dyspneaFinding1 .

   :dyspneaFinding1 a :Finding ;
                    :of :Dyspnea ;
                    :conclusion false .

You have now separated the 'finding' as a concept a bit more cleanly from the symptom/treatment that it is a finding of. Also, whether or not the finding is positive or negative is explicitly modeled (rather than implied by the presence/absense of an 'excluded' property or a 'Exclusion' type).

(As an aside: since we link an individual with a class here via a non-typing relation (... :of :Dyspnea) we must rely on OWL 2 punning to make this valid in OWL DL)

To query for a patient with a finding (whether positive or negative) about Dyspnea:

 SELECT ?x 
 WHERE {
    ?x a :Patient; 
       :hasFinding [ :of :Dyspnea ] .
 }

And to query for patients with confirmed absense of Dyspnea:

 SELECT ?x 
 WHERE {
    ?x a :Patient; 
       :hasFinding [ :of :Dyspnea ;
                     :conclusion false ] .
 }

Upvotes: 2

Joshua Taylor
Joshua Taylor

Reputation: 85813

If your diseases are represented as individuals, then you can use negative object property assertions to literally say, e.g.,

         ¬hasFinding(john,Dyspnea)

        NegativeObjectPropertyAssertion(hasFinding john Dyspnea)

Of course, if you have lots of things that aren't the case, then this might get a bit involved. It's probably the most semantically correct, though. It also means that your query could match directly against the data in the ontology, which might make for quicker results. (Of course, you'd still have the issues of trying to infer when the negative object property holds.)

This doesn't work if diseases are represented as classes, though. If diseases are represented by classes, then you can use class expressions, similar to what you propose. E.g.,

        (∀ hasFinding.¬Dyspnea)(john)

        ClassAssertion(ObjectAllValuesFrom(hasFinding ObjectComplementOf(Dyspnea)) john)

This is similar to your third option, but I wonder if it might perform better. It seems like a slightly more direct way of saying what you're trying to say (i.e., if someone has a disease, it's not one of these diseases).

I do agree with Jeen's answer, though; there's a lot of subjectivity here, and a great deal of getting it "right" is actually just a matter of finding something that's reasonable to work with, performs well enough for you, and that seems not entirely unnatural.

Upvotes: 3

Related Questions