How can IDF be different for several documents?

Question

I am using LETOR to make an information retrieval system. They use TF and IDF. I am sure TF is query-dependent. But IDF should be to, but:

"Note that IDF is document independent, and so all the documents under a query have same IDF values."

But that does not make sense because IDF is part of the feature list. How will IDF for each document be calculated?

jshen · Accepted Answer

IDF is term specific. The IDF of any given term is document independent, but the TF is document specific.

To say it differently. Let's say we have 3 documents.

doc id 1 "The quick brown fox jumps over the lazy dog"

doc id 2 "The Sly Fox Pub Annapolis is located on church circle"

doc id 3 "Located on Church Circle, in the heart of the Historic District"

Now if IDF is (number of documents) / (number of documents containing term t) then the IDF for the term fox is 3/2 regardless of what the search is or what the document is. So IDF is a function of t.

TF on the other hand is a funciton on t and d. So the TF of 'the' for doc id 1 is 2.

How can IDF be different for several documents?

Answers (2)

Related Questions