Reputation: 23
with open('similarities/EuclideanSimilarity.csv', 'w') as result_file:
print('user1,user2,similarity', file=result_file)
print('Calculating similarities between users...')
for u1 in tqdm(users, total=len(users)):
for u2 in users:
ratings1 = np.nan_to_num(np.array(user_ratings_matrix.iloc[u1 - 1].values))
ratings2 = np.nan_to_num(np.array(user_ratings_matrix.iloc[u2 - 1].values))
sim = 1 / (1 + distance.euclidean(ratings1, ratings2))
print(f"{u1},{u2},{sim}", file=result_file)"
~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in getitem(self, key) 1371 1372 maybe_callable = com._apply_if_callable(key, self.obj) -> 1373 return self._getitem_axis(maybe_callable, axis=axis) 1374 1375 def _is_scalar_access(self, key):
~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis) 1828 1829 # validate the location -> 1830 self._is_valid_integer(key, axis) 1831 1832 return self._get_loc(key, axis=axis)
~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _is_valid_integer(self, key, axis) 1711 l = len(ax) 1712 if key >= l or key < -l: -> 1713 raise IndexError("single positional indexer is out-of-bounds") 1714 return True 1715
IndexError: single positional indexer is out-of-bounds
Upvotes: 1
Views: 1648
Reputation: 13999
You don't give enough information about the type/contents of users
or user_ratings_matrix
to reliably answer your question. If I assume that users
is a list of userIDs, and that user_ratings_matrix
is a standard Pandas DataFrame
that is in the same order as users
, then you can rewrite your for
loops as so:
for u1,row1 in tqdm(zip(users, user_ratings_matrix.itertuples(index=False, name=None)), total=len(users)):
for u2,row2 in zip(users, user_ratings_matrix.itertuples(index=False, name=None)):
ratings1 = np.nan_to_num(np.array(row1))
ratings2 = np.nan_to_num(np.array(row2))
sim = 1 / (1 + distance.euclidean(ratings1, ratings2))
print(f"{u1},{u2},{sim}", file=result_file)"
user_ratings_matrix.itertuples(index=False, name=None)
will iterate over the rows in your dataframe and return each as a tuple.
zip(users, user_ratings_matrix.itertuples(index=False, name=None))
will iterate over the pairs of (userID, tuple(dataframe_row))
.
Also, before the next time you post a question on SO, you should probably read these guidelines about how to produce an example that other people can run/work with. It'll help you to get better answers on this site.
Upvotes: 0