tadumtada
tadumtada

Reputation: 174

building fulltext search index for jena and lucene

I would like to perform a full text search on a subset of dbpedia (which i have in a tdb store) with lucene and jena.

String TDBDirectory = "path" ;
Dataset dataset = TDBFactory.createDataset(TDBDirectory) ;

But not over all resources, only over titles. I think by making indices only over the needed triples I can perform a faster search. E.g.

<http://de.dbpedia.org/resource/Gurke> <http://www.w3.org/2000/01/rdf-schema#label> "Gurke"@de .

Here I would like to search for "Gurke", but not in any other triples than the ones with the #label property. So my question is how do I build indices and search only triples with the #label property? I have already looked at http://jena.sourceforge.net/ARQ/lucene-arq.html but it's not detailed enough or too difficult for me.

Upvotes: 0

Views: 873

Answers (1)

AndyS
AndyS

Reputation: 16700

http://jena.sourceforge.net/ is the old home for Jena -- the project is now http://jena.apache.org/ (how did you managed to find that old page?)

The project recently introduced a replacement for LARQ.

http://jena.apache.org/documentation/query/text-query.html

and this is now part of the main codebase. It will released with the 2.10.2 release - for the moment you must use the development build from https://repository.apache.org/content/repositories/snapshots/org/apache/jena/. You either need to be using Fuseki or add it as a dependency for your project.

This new text search subsystem works much better with TDB and Fuseki.

Upvotes: 1

Related Questions