Reputation: 402
If my dataframe looks like this:
user item real value predict
u1 i1 0.0 0.31 0.0
u2 i1 1.0 0.50 0.0
u1 i2 0.0 0.27 0.0
u3 i2 0.0 0.91 0.0
u1 i3 1.0 0.71 1.0
u3 i3 0.0 0.80 1.0
How can I determine how accurate predict
is compared to real
for every single user? So for example:
u1 1.00
u2 0.00
u3 0.50
I was thinking of grouping by users, splitting the dataframe into multiple where the user is the same, transform those two columns into lists and then see how much they match. But I have thousands of users. Is there any better way to do it?
Upvotes: 0
Views: 135
Reputation:
How about this? Since it's a classification problem, would work.
Create another column Diff
which is True if real
and predict
match, False otherwise; then groupby
on user
and find the mean value of Diff
for each user
:
out = df.assign(Diff=df['real']==df['predict']).groupby('user')['Diff'].mean()
Output:
user
u1 1.0
u2 0.0
u3 0.5
Name: Diff, dtype: float64
Upvotes: 1
Reputation: 120439
If you use sklearn
you could easily use mean_squared_error
from sklearn.metrics import mean_squared_error
mse = df.groupby('user').apply(lambda x: mean_squared_error(x['real'], x['predict']))
acc = 1 - mse
print(acc)
# Output:
user
u1 1.0
u2 0.0
u3 0.5
dtype: float64
Note: if you can't or don't want to use sklearn
, use instead:
mean_square_error = lambda r, p: (np.linalg.norm(r-p)**2)/len(r)
Upvotes: 1