Reputation: 15804
Note: I was using the wrong source file for my data - once that was fixed, my issue was resolved. It turns out, there is no simple way to use int(..)
on a string that is not an integer literal.
This is an example from the book "Machine Learning In Action", and I cannot quite figure out what is wrong. Here's some background:
from numpy import as *
def file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines())
returnMat = zeros((numberOfLines,3))
classLabelVector = []
fr = open(filename)
index = 0
for line in fr.readlines():
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1])) # Problem here.
index += 1
return returnMat,classLabelVector
The .txt file is as follows:
40920 8.326976 0.953952 largeDoses
14488 7.153469 1.673904 smallDoses
26052 1.441871 0.805124 didntLike
75136 13.147394 0.428964 didntLike
38344 1.669788 0.134296 didntLike
...
I am getting an error on the line classLabelVector.append(int(listFromLine[-1]))
because, I believe, int(..)
is trying to parse over a String (ie "largeDoses"
) that is a not a literal integer. Am I missing something?
I looked up the documentation for int()
, but it only seems to parse numbers and integer literals:
http://docs.python.org/2/library/functions.html#int
Also, an excerpt from the book explains this section as follows:
Finally, you loop over all the lines in the file and strip off the return line character with line.strip(). Next, you split the line into a list of elements delimited by the tab character: '\t'. You take the first three elements and shove them into a row of your matrix, and you use the Python feature of negative indexing to get the last item from the list to put into classLabelVector. You have to explicitly tell the interpreter that you’d like the integer version of the last item in the list, or it will give you the string version. Usually, you’d have to do this, but NumPy takes care of those details for you.
Upvotes: 2
Views: 446
Reputation: 34027
strings like "largeDoses" could not be converted to integers. In folder Ch02
of that code project, you have two data files, use the second one datingTestSet2.txt
instead of loading the first
Upvotes: 3
Reputation: 832
You can use ast.literal_eval and catch the exception ValueError the malformed string (by the way int('9.4') will raise an exception)
Upvotes: 1