Reputation: 1
I'm doing research on a recommendation system using the Gowalla dataset. However, the dataset has no location rating so I must generate that data into an implicit rating with a value of '1' for those who have visited the location and '0' for those who have never visited the location. How should I create that matrix with python? This is a Gowalla dataset
Upvotes: 0
Views: 1262
Reputation: 201
This snippet of code should do what you are asking. It creates a sparse rating matrix (scipy.sparse.csr_matrix) having number of rows equals to the number of distinct users and number of columns equals to the number of distinct locations.
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix
#Load dataset
df = pd.read_csv('gowalla.csv', sep='\t', names=['user_id','','','','location_id'])
# Group interactions
users_locations = df.groupby(by=['user_id','location_id']).apply(lambda x: 1).to_dict()
# Number of different Users / Locations
nu = len(df['user_id'].unique())
nl = len(df['location_id'].unique())
# Build Rating matrix
row, col = zip(*(users_locations.keys())) #row-> users, col-> locations
map_u = dict(zip(df['user_id'].unique(),range(nu)))
map_l = dict(zip(df['location_id'].unique(),range(nl)))
row_idx = [map_u[u] for u in row]
col_idx = [map_l[l] for l in col]
data = np.array(users_locations.values(), dtype=np.float32)
rating_matrix = csr_matrix((data, (row_idx, col_idx)), shape=(nu,nl))
Upvotes: 1