Reputation: 43
I am looking for the precision and recall of a spam filter using predefined functions
when using predefined functions, I cannot get them to return anything other than the value 1.0.
I know that this is not correct because I am supposed to get a precision result of 0.529411764706.
Also, I am using pop because for some reason the first entry of each list is not a number, so I can't use append(int(...
here are the functions:
def precision(ref, hyp):
"""Calculates precision.
Args:
- ref: a list of 0's and 1's extracted from a reference file
- hyp: a list of 0's and 1's extracted from a hypothesis file
Returns:
- A floating point number indicating the precision of the hypothesis
"""
(n, np, ntp) = (len(ref), 0.0, 0.0)
for i in range(n):
if bool(hyp[i]):
np += 1
if bool(ref[i]):
ntp += 1
return ntp/np
def recall(ref, hyp):
"""Calculates recall.
Args:
- ref: a list of 0's and 1's extracted from a reference file
- hyp: a list of 0's and 1's extracted from a hypothesis file
Returns:
- A floating point number indicating the recall rate of the hypothesis
"""
(n, nt, ntp) = (len(ref), 0.0, 0.0)
for i in range(n):
if bool(ref[i]):
nt += 1
if bool(hyp[i]):
ntp += 1
return ntp/nt
Here's my code:
import hw10_lib
from hw10_lib import precision
from hw10_lib import recall
actual = []
for line in open("/path/hw10.ref", 'r'):
actual.append(line.strip().split('\t')[-1])
actual.pop(0)
predicted = []
for line in open("/path/hw10.hyp", 'r'):
predicted.append(line.strip().split('\t')[-1])
predicted.pop(0)
prec = precision(actual, predicted)
rec = recall(actual, predicted)
print ('Precision: ', prec)
print ('Recall: ', rec)
Upvotes: 2
Views: 77
Reputation: 92569
You are treating strings as numbers in your functions. Testing bool(aString) will always be true if the string is not empty.
Convert your valid fields to float either before you pass them to your functions, or within the functions when you loop over the values.
bool("0") # True
bool("1") # True
If everything is always True then 1 / 1 == 1 and 100 / 100 == 1
Also remember to divide floats and not ints, to maintain float precision.
for i in range(n):
if float(hyp[i]):
np += 1.0
if float(ref[i]):
ntp += 1.0
return ntp/np
You could also just properly append the values to the original list:
for line in open("/path/hw10.ref", 'r'):
try:
val = float(line.strip().split('\t')[-1])
except:
continue
actual.append(val)
Then you will have only valid floats and no need to pop.
Upvotes: 1