Reputation: 83
I have a pandas dataframe (Interactions dataframe) with columns as User, Item, Rating.
Ratings ItemID UserID
1 1172952 A74
1 1178735 176
4 341785 70C
3 136771 67E
2 1178883 383
Let's say I have two more dataframes with 200 users and 1000 ietms respectively. The problem is that in interactions dataframe, i need ratings for each user and each item combination. 0 rating for user/item combination if there is no interaction available for that user and item in interactions dataframe.
I have tried using a loop like this:
item_ids = np.repeat(item_data.id.values, len(user_data.id.values))
user_ids = np.tile(user_data.id.values, len(item_data.id.values))
ratings = np.empty([len(item_ids)])
for i in range(len(ratings)):
inter = interactions.loc[(interactions['UserID'] == user_ids[i]) & (interactions["ItemID"] == item_ids[i]), "Ratings"]
if not inter.empty:
ratings[i] = inter.values[0]
else:
ratings[i] = 0
interactions = np.stack((ratings, item_ids, user_ids), axis=-1)
But it takes 40 seconds to complete for a ratings array of just 30,000 rows. Is there a quick way of doing this? Thanks for help.
Upvotes: 2
Views: 211
Reputation: 57033
Your explanation of the problem is a little sloppy, but I have a feeling that you need this:
interactions.set_index(['ItemID','UserID'])\
.unstack().fillna(0).astype(int).stack()\
.reset_index()
This code creates a rectangular table of users and items, fills the voids with zeros, and converts the table back into a "tall" vector. Output:
ItemID UserID Ratings
0 136771 176 0
1 136771 383 0
2 136771 67E 3
3 136771 70C 0
4 136771 A74 0
5 341785 176 0
6 341785 383 0
7 341785 67E 0
8 341785 70C 4
9 341785 A74 0
10 1172952 176 0
....
I assume that each item and each user (but not their combinations!) are referenced in the interaction
table at least once. If not, some merge
'ing with the other two tables is needed.
Upvotes: 2