How this method or formula for calculating ROC AUC works?

Question

I was trying to calculate the AUC using MySQL for the data in table like below:

y   p
1   0.872637
0   0.130633
0   0.098054
...
...
1   0.060190
0   0.110938

I came across the following SQL query which is giving the correct AUC score (I verified using sklearn method).

 SELECT (sum(y*r) - 0.5*sum(y)*(sum(y)+1)) / (sum(y) * sum(1-y)) AS auc
 FROM ( 
   SELECT y, row_number() OVER (ORDER BY p) r
   FROM probs
 ) t

 Using pandas this can be done as follows:

 temp = df.sort_values(by="p")
 temp['r'] = np.arange(1, len(df)+1, 1)
 temp['yr'] = temp['y']*temp['r']
 print( (sum(temp.yr) - 0.5*sum(temp.y)*(sum(temp.y)+1)) / (sum(temp.y) * sum(1-temp.y)) )

I did not understand how we are able to calculate AUC using this method. Can somebody please give intuition behind this?

I am already familiar with the trapezoidal method which involves summing the area of small trapezoids under the ROC curve.

How this method or formula for calculating ROC AUC works?

Answers (1)

Related Questions