marth17
marth17

Reputation: 43

Floats not behaving like floats in a predefined function

I am looking for the precision and recall of a spam filter using predefined functions

when using predefined functions, I cannot get them to return anything other than the value 1.0.

I know that this is not correct because I am supposed to get a precision result of 0.529411764706.

Also, I am using pop because for some reason the first entry of each list is not a number, so I can't use append(int(...

here are the functions:

def precision(ref, hyp):
    """Calculates precision.
    Args:
    - ref: a list of 0's and 1's extracted from a reference file
    - hyp: a list of 0's and 1's extracted from a hypothesis file
    Returns:
    - A floating point number indicating the precision of the hypothesis
    """
    (n, np, ntp) = (len(ref), 0.0, 0.0)
    for i in range(n):
            if bool(hyp[i]):
                    np += 1
                    if bool(ref[i]):
                            ntp += 1
    return ntp/np

def recall(ref, hyp):
    """Calculates recall.
    Args:
    - ref: a list of 0's and 1's extracted from a reference file
    - hyp: a list of 0's and 1's extracted from a hypothesis file
    Returns:
    - A floating point number indicating the recall rate of the hypothesis
    """
    (n, nt, ntp) = (len(ref), 0.0, 0.0)
    for i in range(n):
            if bool(ref[i]):
                    nt += 1
                    if bool(hyp[i]):
                            ntp += 1
    return ntp/nt

Here's my code:

import hw10_lib
from hw10_lib import precision
from hw10_lib import recall

actual = []
for line in open("/path/hw10.ref", 'r'):
    actual.append(line.strip().split('\t')[-1])
actual.pop(0)

predicted = []
for line in open("/path/hw10.hyp", 'r'):
    predicted.append(line.strip().split('\t')[-1])
predicted.pop(0)

prec = precision(actual, predicted)
rec = recall(actual, predicted)

print ('Precision: ', prec)
print ('Recall: ', rec)

Upvotes: 2

Views: 77

Answers (1)

jdi
jdi

Reputation: 92569

You are treating strings as numbers in your functions. Testing bool(aString) will always be true if the string is not empty.

Convert your valid fields to float either before you pass them to your functions, or within the functions when you loop over the values.

bool("0") # True
bool("1") # True

If everything is always True then 1 / 1 == 1 and 100 / 100 == 1

Also remember to divide floats and not ints, to maintain float precision.

    for i in range(n):
        if float(hyp[i]):
            np += 1.0
            if float(ref[i]):
                ntp += 1.0
    return ntp/np

You could also just properly append the values to the original list:

for line in open("/path/hw10.ref", 'r'):
    try:
        val = float(line.strip().split('\t')[-1])
    except:
        continue
    actual.append(val)

Then you will have only valid floats and no need to pop.

Upvotes: 1

Related Questions