SUNDONG
SUNDONG

Reputation: 2769

General Matrix computation in Python, TF-IDF

While generating TF-IDF module, I just faced this matrix-vector computation.

A % b = C

[[1,2], [3,4]] % [1/2, 1/3] = [[1/2, 2/3], [3/2, 4/3]]

Here A is a matrix of Document x Words where A_ij is a Term-Frequency count of word i in document j. And b vector is pre-calculated IDF value for each words, for instance b_j is 1/7 if word j is used among 7 different documents.

How does the people call this column-wise multiplication? And are there any existing library support this operation? (Python)

Upvotes: 1

Views: 156

Answers (1)

Mikhail M.
Mikhail M.

Reputation: 5978

Use NumPy for it.

It is element-wise multiplication:

import numpy as np
A = np.array([[1, 2], [3, 4]])
b = np.array([1/2, 1/3])
print(A * b)

output:

[[ 0.5         0.66666667]
 [ 1.5         1.33333333]]

In case of csr_matrix:

from scipy.sparse import csr_matrix
x1 = csr_matrix([[1, 2], [3, 4]])
x2 = csr_matrix([1/2, 1/3])
print(x1.multiply(x2).todense())

output:

[[ 0.5         0.66666667]
 [ 1.5         1.33333333]]

Upvotes: 2

Related Questions