Reputation: 13
I have indexed data of 10 websites in solr. Now i want to dump data of each website with following format : [Term,Frequency of terms in that website ,IDF,website]
e.g : [management,12,145,example.com]
where 12 is a frequency of term in example.com, 145 is IDF of term in index.
Can i do this with solr and How?
Upvotes: 1
Views: 169
Reputation: 28552
Some low-level API:
InderReader reader = IndexReader.open(directory);
TermDocs termDocs = reader.termDocs();
// TermDocs termDocs = reader.termDocs(term); // if you need docs containing specific term
while (termDocs.next()) {
System.out.println("Doc #: " + termDocs.doc());
System.out.println("Full document: " + reader.document(termDocs.doc()));
System.out.println("Term frequency: " + termDocs.freq());
}
For tf*idf see DefaultSimilarity and this question for some comments.
Upvotes: 0
Reputation: 478
If you are looking to measure the the distribution of the distinct terms across the documents than histogram is what you want. Check LukeRequestHandler example.
Upvotes: 1