johnnydoe
johnnydoe

Reputation: 402

How much two columns match based on another column

If my dataframe looks like this:

user   item   real   value  predict
  u1     i1    0.0    0.31      0.0
  u2     i1    1.0    0.50      0.0
  u1     i2    0.0    0.27      0.0
  u3     i2    0.0    0.91      0.0
  u1     i3    1.0    0.71      1.0
  u3     i3    0.0    0.80      1.0

How can I determine how accurate predict is compared to real for every single user? So for example:

u1   1.00
u2   0.00
u3   0.50

I was thinking of grouping by users, splitting the dataframe into multiple where the user is the same, transform those two columns into lists and then see how much they match. But I have thousands of users. Is there any better way to do it?

Upvotes: 0

Views: 135

Answers (2)

user7864386
user7864386

Reputation:

How about this? Since it's a classification problem, would work.

Create another column Diff which is True if real and predict match, False otherwise; then groupby on user and find the mean value of Diff for each user:

out = df.assign(Diff=df['real']==df['predict']).groupby('user')['Diff'].mean()

Output:

user
u1    1.0
u2    0.0
u3    0.5
Name: Diff, dtype: float64

Upvotes: 1

Corralien
Corralien

Reputation: 120439

If you use sklearn you could easily use mean_squared_error

from sklearn.metrics import mean_squared_error

mse = df.groupby('user').apply(lambda x: mean_squared_error(x['real'], x['predict']))
acc = 1 - mse
print(acc)

# Output:
user
u1    1.0
u2    0.0
u3    0.5
dtype: float64

Note: if you can't or don't want to use sklearn, use instead:

mean_square_error = lambda r, p: (np.linalg.norm(r-p)**2)/len(r)

Upvotes: 1

Related Questions