Kshitij Yadav
Kshitij Yadav

Reputation: 1387

Convert 1-D array into sparse matrix

I am working on a recommendation project where I have data like this:

ID Movie
1   A
2   B
3   C
4   D
..
..

I want to create this dataframe into a sparse matrix like this:

     1  2  3  4 ....n

1    1  0  0  0     0
2    0  1  0  0     0
3    0  0  1  0     0
4    0  0  0  1     0
.
.
n    0  0  0  0     1

Basically both rows and columns contains the ID of the move, and the value is 1 when both row and column element has same value. I want to represent this into a sparse format of

 <sparse matrix of type '<class 'numpy.int32'>'
    with 58770 stored elements in Compressed Sparse Row format>

I tried doing the following:

 - np.diag(items)
 - csr_matrix(items.values)

But I am not able to figure it out. Can anyone help me?

Upvotes: 0

Views: 371

Answers (2)

Ricky Kim
Ricky Kim

Reputation: 2022

You can use scipy.sparse.spdiags

num_data=len(df)
sp=sparse.spdiags(np.ones(num_data), 0, num_data,num_data)

OUTPUT

  (0, 0)    1.0
  (1, 1)    1.0
  (2, 2)    1.0
  (3, 3)    1.0

If ID of the movie is not consistent:

sparse.coo_matrix((np.ones(num_data),(df['ID'],df['ID'])))

if ID is from two different dataframe:

match=list(set(df['ID']).intersection(set(df2['ID'])))
sparse.coo_matrix((np.ones(num_data),(match,match)))

Upvotes: 1

Code-Apprentice
Code-Apprentice

Reputation: 83517

A matrix with ones down the diagonal and zeros everywhere else is called an "identity matrix". You can create one in python with scipy.sparse.identity(n). The documentation is here.

Upvotes: 1

Related Questions