kju
kju

Reputation: 146

Represent a document to a vector by Lucene.

I want to build document vector for SVM text categorization. I have indexed my documents to 2 POSITIVE and NEGATIVE documents. And I selected my features space with IG method.

How can I represent a documents become a vector with tf-idf weight term by Lucene.

Thanks !

Best regard!

Upvotes: 1

Views: 656

Answers (1)

Shashikant Kore
Shashikant Kore

Reputation: 5052

Apache Mahout is a machine learning library in Java. It has utilities to create document vectors from lucene index (created from raw text). You can adopt the code as per your requirement.

Upvotes: 1

Related Questions