user1834873
user1834873

Reputation: 13

How to retrive all terms with their frequency in that website

I have indexed data of 10 websites in solr. Now i want to dump data of each website with following format : [Term,Frequency of terms in that website ,IDF,website]

e.g : [management,12,145,example.com] 
where 12 is a frequency of term in example.com, 145 is IDF of term in index.

Can i do this with solr and How?

Upvotes: 1

Views: 169

Answers (2)

ffriend
ffriend

Reputation: 28552

Some low-level API:

InderReader reader = IndexReader.open(directory);
TermDocs termDocs = reader.termDocs();   
// TermDocs termDocs = reader.termDocs(term);   //  if you need docs containing specific term
while (termDocs.next()) {
    System.out.println("Doc #: " + termDocs.doc());
    System.out.println("Full document: " + reader.document(termDocs.doc()));
    System.out.println("Term frequency: " + termDocs.freq());        
}

For tf*idf see DefaultSimilarity and this question for some comments.

Upvotes: 0

RoiG
RoiG

Reputation: 478

If you are looking to measure the the distribution of the distinct terms across the documents than histogram is what you want. Check LukeRequestHandler example.

Upvotes: 1

Related Questions