RadonNikodym
RadonNikodym

Reputation: 3

Error in building sparse matrix Python Scipy.sparse

In my code I am currently iterating and creating three lists:

data, row, col

to construct a sparse matrix (it represents a rating matrix with user u having rated item i with a rating from 1 to 5), I got weird errors in the reported ratings when checking my sparse matrix afterwards: some values are greater than 5 which is not possible (I checked the file and there is no rating greater than 5, I also checked the values in the data list and there is no value greater than 5, so the error is probably when building the matrix using sparse.coo_matrix(),

See my code below:

from scipy import sparse
import numpy as np

row = []
column = []
data= []

with open(filename, 'r') as f:
    for line in f:
        if not line[0].isdigit():
            continue
        line = line.strip()
        elem = line.split(',')

        userid = int(elem[0].strip())
        businessid = int(elem[1].strip())
        rating = float(elem[2].strip())

        row.append(userid)
        column.append(businessid)
        data.append(rating)

#data = np.array(data)

"""checking if any rating in the file is greater than 5,
and there is not"""
for rating in data:
    if rating > 5:
        print rating

total = sparse.coo_matrix((data, (row, column)),dtype=float).tocsr()

""" Here I'm checking to see if 
there is any rating over than 5 in the sparse matrix
and there is!"""
row = total.nonzero()[0]
column = total.nonzero()[1]

for u in range(len(row)):
    indr = row[u]
    indc = column[u]
    if total[indr, indc] > 5:
        print '---'
        print total[indr, indc]
        print indr
        print indc

And here is the beginning of my file:

user,item,rating
480,0,5
16890,0,2
5768,0,4
319,1,1
4470,1,4
7555,1,5
8768,1,5

Do you have any idea of why I'm getting this error when building the matrix ?

Thanks a lot!

Upvotes: 0

Views: 324

Answers (1)

maxymoo
maxymoo

Reputation: 36555

From the docs for to_csr:

Duplicate entries will be summed together

(I have no idea why it does this.)

Upvotes: 1

Related Questions