Reputation: 143
Do you know where I can find source code(any language) to program an information retrieval system based on the probabilistic model?
I tried to search it on the web and found an algorithm named bm25 or bmf25, but I don't know if it is useful.
Basically I´m trying to compare the performance of 3 IR algorithms: Vector space model, boolean model and the probabilistic model. Right now I have found the vector space and the boolean models. Depending on the results we need to use the best of them to develop a question-answering system
Thanks in advance
Upvotes: 0
Views: 1896
Reputation: 25
The confusion here is that there are several probabilistic IR models (e.g. 2-Poisson, Binary Independence Model, language modeling variants), so the question is ambiguous. But in my experience, when people say "the probabilistic model" they usually mean some variant of the Binary Independence model due to Robertson and Sparch-Jones. BM25 (quite roughly) approximates this model, and that's what I'd use in this case. A canonical implementation of BM25 is included in the Lemur Toolkit. See:
http://www.lemurproject.org/doxygen/lemur/html/OkapiRetMethod_8hpp-source.html
Upvotes: 0
Reputation: 3005
If you are looking for an IR engine that have BM25 implemented, you can try Terrier IR Platform
The language is Java. You can either use the engine itself or look into the source code for implementations of BM25 or other term weighting models.
Upvotes: 1