Reputation: 31
I am trying to create a show recommender using the information about the show so I believe this is content based. I want a person to select a show they have watched and be recommended similar shows based on their content.
Currently my data file looks like this
Code Genre
1260064148537,NOGENRE
1260064149243,Drama
1260064149741,Spoof
1260064764631,Classical
12600647412748,HipHopRnB&Dancehall
126006483593,NOGENRE
1260065049943,NOGENRE
12600705429,Sketch
1260070324431,News
126007032486,Sport
...
I have written my own ItemSimilarity to find similarities in the genres but what I don't know is how to use a DataModel with my data as I have a Long and a String and then how to send that to a recommender. Do I have to write my own DataModel? If so how do I go about this?
Upvotes: 3
Views: 2712
Reputation: 66876
The first question is whether you have any other data, connecting users to shows. If you don't, then you don't actually have a recommender problem. This is just a similar-items problem. You recommend stuff similar to what the user is looking at now.
Of course you have to define similarity. If all you have is a single label for each show, there's not a lot you can do except say they are similar when having the same label, and not otherwise. You can use ItemSimilarity
and iterate over all items (perhaps precompute this) to compute things most similar to the current item.
Of course, if your similarity is just 0 or 1 depending on whether they share a label, that's not even a similarity problem. It's just search. Find things in the same category and you're done.
A recommender comes into play when you have user-item data at heart. You can employ this kind of data to make an ItemSimilarity
and then use that plus the user-item data (maybe these are view counts, etc.) to make the recommender. But I also think you have to evaluate whether you can get richer label data; if so you can certainly make better similarity metrics.
(This isn't input you would use for DataModel
for reasons above. But I should note that you can't use string identifiers, they have to be numbers. It's possible to use strings with some additional work but it's not that worth it.)
Upvotes: 5