Reputation: 161
I got confused about computing AUC (area under curve) to evaluate recommendation system result.
If we have cross validation data like (user, product, rating). How to choose positive sample and negative sample for each user to compute AUC?
Is it good to choose products occurred for each user in dataset as positive sample, and the rest did not occur in dataset as negative sample? I think this way can not find out those "real" negative samples, because user has chance to like those products in negative samples.
Upvotes: 12
Views: 6827
Reputation: 380
"A ROC curve plots recall (true positive rate) against fallout (false positive rate) for increasing recommendation set size." Schröder, Thiele, and Lehner 2011 (PDF)
In general, you will hold out a portion of your data as testing data. For a particular user, you would train on (for instance) 80% of her data and try to predict which items (out of all items in your dataset) she'll exhibit a preference for based on the remaining 20% of her data.
Let's say you're building a Top-20 recommender. The 20 items you recommend for a user are the Positive items, and the unrecommended items are Negative. True Positive items are therefore the items that you showed in your Top-N list that match what the user preferred in her held-out testing set. False Positive are the items in your Top-N list that don't match her preferred items in her held-out testing set. True Negative items are those you didn't include in your Top-N recommendations and are items the user didn't have in her preferred items in her held-out testing set. And False Negative are items you didn't include in your Top-N recommendations but do match what the user preferred in her held-out testing set. That's the confusion matrix. Now you can vary the number of items you recommend and calculate the confusion matrix for each, calculate recall and fallout for each, and plot the ROC.
Upvotes: 13