Reputation: 307
I am trying to write a program to find the similarity between 2 files of document. For this reason, I am following this link and a posting from
But, an error is shown up which says
"list object is not callable"
at
test(tf_idf_matrix,count,nltkutil.cosine_distance)
line.
I am using one file as train set and other file as test set and my objective is to use the test()
to output the cosine similarity between 2 documents using tf-idf.
My code is following:
def test(tdMatrix,count,fsim):
sims=[]
sims = np.zeros((len(tdMatrix), count))
for i in range(len(tdMatrix)):
for j in range(count):
doc1 = np.asarray(tdMatrix[tdMatrix[i], :].todense()).reshape(-1)
doc2 = np.asarray(tdMatrix[tdMatrix[j], :].todense()).reshape(-1)
sims[i, j] = fsim(doc1, doc2)
print sims
def main():
file_set=["corpusA.txt","corpusB.txt"]
train=[]
test=[]
for file1 in file_set:
s="x"+file1
preprocess(file1,s)
count_vectorizer = CountVectorizer()
m=open("xcorpusA.txt",'r')
for i in m:
train.append(i.strip())
#print doc
count_vectorizer.fit_transform(train)
m1=open("xcorpusB.txt",'r')
for i in m1:
test.append(i.strip())
freq_term_matrix = count_vectorizer.transform(test)
#print freq_term_matrix.todense()
tfidf = TfidfTransformer(norm="l2")
tfidf.fit(freq_term_matrix)
#print "IDF:", tfidf.idf_
tf_idf_matrix = tfidf.transform(freq_term_matrix)
print (tf_idf_matrix.toarray())
count=0
for i in tf_idf_matrix.toarray():
for j in i:
count+=1
break
print "Results with Cosine Distance Similarity Measure"
test(tf_idf_matrix,count,nltkutil.cosine_distance)
if __name__ == "__main__":
main()
Upvotes: 0
Views: 298
Reputation: 121975
In main()
you define a list named test
:
test=[]
This list shadows the function named test()
you define outside main()
, so when you try:
test(tf_idf_matrix,count,nltkutil.cosine_distance)
Python attempts to call the list with the supplied arguments. Inevitably, you get a TypeError
.
To fix this, rename either the list, the function, or (ideally) both with names that more clearly describe what they are for.
Upvotes: 1