sabu
sabu

Reputation: 307

Inter document similarity(cosine similarity)

I am trying to write a program to find the similarity between 2 files of document. For this reason, I am following this link and a posting from

But, an error is shown up which says

"list object is not callable"

at

test(tf_idf_matrix,count,nltkutil.cosine_distance)

line.

I am using one file as train set and other file as test set and my objective is to use the test() to output the cosine similarity between 2 documents using tf-idf.

My code is following:

def test(tdMatrix,count,fsim):

    sims=[] 
    sims = np.zeros((len(tdMatrix), count))

    for i in range(len(tdMatrix)):
        for j in range(count):
                doc1 = np.asarray(tdMatrix[tdMatrix[i], :].todense()).reshape(-1)
                doc2 = np.asarray(tdMatrix[tdMatrix[j], :].todense()).reshape(-1)

                sims[i, j] = fsim(doc1, doc2)

            print sims

def main():

    file_set=["corpusA.txt","corpusB.txt"]
    train=[]
    test=[]

    for file1 in file_set:
        s="x"+file1
        preprocess(file1,s)

    count_vectorizer = CountVectorizer()
    m=open("xcorpusA.txt",'r')
    for i in m:
        train.append(i.strip())
    #print doc
    count_vectorizer.fit_transform(train)


    m1=open("xcorpusB.txt",'r')
    for i in m1:
        test.append(i.strip())

    freq_term_matrix = count_vectorizer.transform(test)
    #print freq_term_matrix.todense()

    tfidf = TfidfTransformer(norm="l2")
    tfidf.fit(freq_term_matrix)

    #print "IDF:", tfidf.idf_

    tf_idf_matrix = tfidf.transform(freq_term_matrix)
    print (tf_idf_matrix.toarray())

    count=0

    for i in tf_idf_matrix.toarray():
        for j in i:
            count+=1    
        break

    print "Results with Cosine Distance Similarity Measure"
    test(tf_idf_matrix,count,nltkutil.cosine_distance)


if __name__ == "__main__":
    main()

Upvotes: 0

Views: 298

Answers (1)

jonrsharpe
jonrsharpe

Reputation: 121975

In main() you define a list named test:

test=[]

This list shadows the function named test() you define outside main(), so when you try:

test(tf_idf_matrix,count,nltkutil.cosine_distance)

Python attempts to call the list with the supplied arguments. Inevitably, you get a TypeError.

To fix this, rename either the list, the function, or (ideally) both with names that more clearly describe what they are for.

Upvotes: 1

Related Questions