Reputation: 1387
I am working on a recommendation project where I have data like this:
ID Movie
1 A
2 B
3 C
4 D
..
..
I want to create this dataframe into a sparse matrix like this:
1 2 3 4 ....n
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 1 0 0
4 0 0 0 1 0
.
.
n 0 0 0 0 1
Basically both rows and columns contains the ID of the move, and the value is 1 when both row and column element has same value. I want to represent this into a sparse format of
<sparse matrix of type '<class 'numpy.int32'>'
with 58770 stored elements in Compressed Sparse Row format>
I tried doing the following:
- np.diag(items)
- csr_matrix(items.values)
But I am not able to figure it out. Can anyone help me?
Upvotes: 0
Views: 371
Reputation: 2022
You can use scipy.sparse.spdiags
num_data=len(df)
sp=sparse.spdiags(np.ones(num_data), 0, num_data,num_data)
OUTPUT
(0, 0) 1.0
(1, 1) 1.0
(2, 2) 1.0
(3, 3) 1.0
If ID
of the movie is not consistent:
sparse.coo_matrix((np.ones(num_data),(df['ID'],df['ID'])))
if ID
is from two different dataframe:
match=list(set(df['ID']).intersection(set(df2['ID'])))
sparse.coo_matrix((np.ones(num_data),(match,match)))
Upvotes: 1
Reputation: 83517
A matrix with ones down the diagonal and zeros everywhere else is called an "identity matrix". You can create one in python with scipy.sparse.identity(n)
. The documentation is here.
Upvotes: 1