cHam
cHam

Reputation: 2664

specific type of SPARQL query assistance needed please

I have a database of RDF triples that are along the lines of: keyword002 isKeywordIn doc0892 keyword002 phrase "thisIsATest"

I have a list of keywords that I want to find matches for in one query, For example, lets say I have 10 document, and I want to know (for each document) which, if any, of the keywords "testing3", "fubared","noob" are in the document.

what SPARQL query could I construct so that I can put specify a list of exact words (I tries a regex filter but did't seem to work. gave me partial matches too and didn't give me the document name) and Get back the name of the containing document and the keywords that matched?

I have been stuck on this for days and I get it to work but it is through a few recursive loops and takes forever, I need to speed things up drasticly.

**My server is down so I can't access my triplestore right now but thank you both for the reply! If I have any questions I will repost, thank you so much!

Upvotes: 0

Views: 251

Answers (2)

Ian Dickinson
Ian Dickinson

Reputation: 13295

Untested, but one approach would be something like:

select distinct ?keyword ?document
where {
  ?keyword ns:isKeywordIn ?document;
           ns:phrase ?phrase.
  FILTER regex( ?phrase, "^(testing3|n00b|fubared)$", "i" )
}

This will give you pairs of document and keyword, where the keyword matches any one of the user input patterns. Note the use of ^ ... $ anchors so that you only get full word matches, not partial. However, this may be slow because there's not much distinctive information to index the query on, so the query engine will have to test each keyword in the corpus.

An alternative is to union the tests for multiple keywords:

select distinct ?keyword ?document
where {
  {?keyword ns:phrase "testing3" ; ns:isKeywordIn ?document}
  union
  {?keyword ns:phrase "n00b" ; ns:isKeywordIn ?document}
  union
  {?keyword ns:phrase "fubared" ; ns:isKeywordIn ?document}
}

A reasonable query optimizer should be able to use the more specific :phrase triples to index the query. However, it's slightly more complex to construct the query. Another drawback is that you don't have the equivalent of the ignore-case ("i") flag that you have in the regex example, so you user input must match your keyword text exactly.

A final alternative is to use a SPARQL extension to exploit a free-text index alongside the triple store. E.g. for Jena, see LARQ.

Upvotes: 3

Michael
Michael

Reputation: 4886

Generally, you should avoid using regex in a SPARQL query. SPARQL engines are not typically designed to handle that very well. The ones that do provide specific functionality for doing regex or keyword type searches over literal values, often based on a special Lucene index. Normally regex is just going to end up doing lots of regexs over any relevant Literal values, which can be very expensive.

This should return the documents with the keyword "testing3"

select ?doc ?name where {
  ?doc :name ?name .
  ?keyword :isKeywordIn ?doc .
  ?keyword :phrase "testing3" .
}

If you want to get all documents which contain two specific keywords:

select ?doc ?name where {
  ?doc :name ?name .
  ?keyword :isKeywordIn ?doc .
  ?keyword :phrase "testing3" .
  ?kw :isKeywordIn ?doc .
  ?kw :phrase "noob" .
}

If you want to get all documents which either of the two specific keywords:

select distinct ?doc ?name where { 
  ?doc :name ?name .
  {
    ?keyword :isKeywordIn ?doc .
    ?keyword :phrase "testing3" .
  } union {
    ?kw :isKeywordIn ?doc .
    ?kw :phrase "noob" .
  } 
  }

I think this will get you want you're looking for, typo's and exact use of your domain ontology not withstanding.

Upvotes: 2

Related Questions