modulitos
modulitos

Reputation: 15804

Python: using int() on a string that is not an integer literal

Note: I was using the wrong source file for my data - once that was fixed, my issue was resolved. It turns out, there is no simple way to use int(..) on a string that is not an integer literal.

This is an example from the book "Machine Learning In Action", and I cannot quite figure out what is wrong. Here's some background:

from numpy import as *

def file2matrix(filename):
    fr = open(filename)
    numberOfLines = len(fr.readlines())
    returnMat = zeros((numberOfLines,3))
    classLabelVector = []
    fr = open(filename)
    index = 0
    for line in fr.readlines():
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        classLabelVector.append(int(listFromLine[-1])) # Problem here.
        index += 1
    return returnMat,classLabelVector

The .txt file is as follows:

40920   8.326976    0.953952    largeDoses
14488   7.153469    1.673904    smallDoses
26052   1.441871    0.805124    didntLike
75136   13.147394   0.428964    didntLike
38344   1.669788    0.134296    didntLike
...

I am getting an error on the line classLabelVector.append(int(listFromLine[-1])) because, I believe, int(..) is trying to parse over a String (ie "largeDoses") that is a not a literal integer. Am I missing something?

I looked up the documentation for int(), but it only seems to parse numbers and integer literals:

http://docs.python.org/2/library/functions.html#int

Also, an excerpt from the book explains this section as follows:

Finally, you loop over all the lines in the file and strip off the return line character with line.strip(). Next, you split the line into a list of elements delimited by the tab character: '\t'. You take the first three elements and shove them into a row of your matrix, and you use the Python feature of negative indexing to get the last item from the list to put into classLabelVector. You have to explicitly tell the interpreter that you’d like the integer version of the last item in the list, or it will give you the string version. Usually, you’d have to do this, but NumPy takes care of those details for you.

Upvotes: 2

Views: 446

Answers (2)

zhangxaochen
zhangxaochen

Reputation: 34027

strings like "largeDoses" could not be converted to integers. In folder Ch02 of that code project, you have two data files, use the second one datingTestSet2.txt instead of loading the first

Upvotes: 3

Dysosmus
Dysosmus

Reputation: 832

You can use ast.literal_eval and catch the exception ValueError the malformed string (by the way int('9.4') will raise an exception)

Upvotes: 1

Related Questions