avd
avd

Reputation: 14451

Any Latent Semantic Indexing?

Is there any open source implementation of LSI in Java? I want to use that library for my project. I have seen jLSI but it implements some other model of LSI. I want a standard model.

Upvotes: 6

Views: 8489

Answers (6)

Matt Wright
Matt Wright

Reputation: 1

Have you tried the Semantic Vector package?

http://code.google.com/p/semanticvectors/

Upvotes: 0

S Gaber
S Gaber

Reputation: 1560

a google search for NLP tools provide this slides which i think helps ...

Upvotes: 1

David Jurgens
David Jurgens

Reputation: 304

The S-Space Package has an open source version of LSA, with bindings for the LSI document vectors. (Both approaches operate on the same term-document matrix and are equivalent except in the output.) It's a fairly scalable approach that uses the thin-SVD. I've used it to run LSI on all of Wikipedia with no issue (after removing the infrequent terms with less than 5 occurrences).

As Scott Ray mentioned, the SemanticVectors package also has a good LSI implementation that recently switched to using the same thin-SVD (SVDLIBJ), so you might check that out as if you hadn't before.

Upvotes: 1

Andrew Beck
Andrew Beck

Reputation: 1

I believe that LSA/LSI was patented in 1989, which means the patent should have just expired. Hopefully we will see some nice open source applications soon.

Upvotes: 0

Toshio Nakamura
Toshio Nakamura

Reputation: 66

Have you considered LDA (Latent Dirichlet allocation)? I haven't really either, but I encountered the same problem with LSI recently (patents). From what I understand LDA is a related/more powerful technique. http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation apparently has some links to open-source implementations.

Upvotes: 5

Scott Ray
Scott Ray

Reputation: 11

A google search for java LSI leads to a similar question that recommends SemanticVectors. A package built on top of Lucene that is 'similar' to LSI. I don't know if it's closer than the jLSI implementation.

That thread also mentions that LSI is patented and there aren't a lot of implementations of it. So if you need a standard implementation you may have to use a language other than java.

Upvotes: 1

Related Questions