Reputation: 1011
I've to create a dataset from some text files, writing them as vectors of features.
Something like this:
doc1: 1,0.45 6,0.001 94,0.1 ...
doc2: 3,0.5 98,0.2 ...
...
each position of the vector represent a word, and the score is given by something like TF-IDF.
Do you know some library/tool/whatever for this? (java is better)
Upvotes: 1
Views: 2785
Reputation: 1011
After some days i found the "perfect tool" for this: Word Vector Tool. http://sourceforge.net/projects/wvtool/
Upvotes: 2
Reputation: 2500
Sure there are many eg http://en.wikipedia.org/wiki/Lucene
However
I recommend that you write an basic IR system from scratch. Looking under the hood is always a great learning experience.
Upvotes: 0