Claudia Guirao
Claudia Guirao

Reputation: 335

Co occurrence Matrix in Python, scipy coo_matrix

I have a document-term matrix, built with the co occurrence of terms from a corpus, as it is explained here:

vocabulary = {}  # map terms to column indices
data = []        # values (maybe weights)
row = []         # row (document) indices
col = []         # column (term) indices

import scipy
for i, doc in enumerate(bloblist):
for term in doc:
    # get column index, adding the term to the vocabulary if needed
    j = vocabulary.setdefault(term, len(vocabulary))
    data.append(1)  # uniform weights
    row.append(i)
    col.append(j)
A = scipy.sparse.coo_matrix((data, (row, col)))

>>>print A

(0, 0)  1
(0, 1)  1
(0, 2)  1
(0, 3)  1
...

Now I would like to export it to a csv or write it in db. I can't figure out how to do it, i don't know how to deal with sparse matrix.

When I try I'm always receiving this error:

TypeError: 'coo_matrix' object has no attribute '__getitem__'

Upvotes: 1

Views: 983

Answers (2)

Zachi Shtain
Zachi Shtain

Reputation: 836

scipy has many formats for sparse matrices. You could convert the matrix to one of the other types using methods such as to_csc() or to_csr() which allow to access their members

Upvotes: 0

paul-g
paul-g

Reputation: 3877

Please have a look at the input/output section of scipy. You can use mmwrite to write the matrix using the matrix market format which is a standard format for sparse matrix storage.

An example below to create a random sparse matrix and write it out as a MM file:

>>> import scipy.sparse
>>> A = scipy.sparse.rand(20, 20)
>>> print A
  (3, 4)    0.0579085844686
  (14, 9)   0.914421740712
  (15, 10)  0.622861279405
  (5, 17)   0.83146022149
>>> import scipy.io
>>> scipy.io.mmwrite('output', A)

The contents of output.mtx:

→ cat output.mtx 
%%MatrixMarket matrix coordinate real general
%
20 20 4
4 5 0.05790858446861069
15 10 0.9144217407118101
16 11 0.6228612794046831
6 18 0.8314602214903816

Upvotes: 2

Related Questions