Reputation: 3089
I have a Numpy 2-D array in which one column has Boolean values i.e. True
/False
. I want to convert it to integer 1
and 0
respectively, how can I do it?
E.g. my data[0::,2]
is boolean, I tried
data[0::,2]=int(data[0::,2])
, but it is giving me error:
TypeError: only length-1 arrays can be converted to Python scalars
My first 5 rows of array are:
[['0', '3', 'True', '22', '1', '0', '7.25', '0'],
['1', '1', 'False', '38', '1', '0', '71.2833', '1'],
['1', '3', 'False', '26', '0', '0', '7.925', '0'],
['1', '1', 'False', '35', '1', '0', '53.1', '0'],
['0', '3', 'True', '35', '0', '0', '8.05', '0']]
Upvotes: 21
Views: 67036
Reputation: 691
Old Q but, for reference - a bool can be converted to an int and an int to a float
data[0::,2]=data[0::,2].astype(int).astype(float)
Upvotes: 0
Reputation: 840
boolarrayvariable.astype(int) works:
data = np.random.normal(0,1,(1,5))
threshold = 0
test1 = (data>threshold)
test2 = test1.astype(int)
Output:
data = array([[ 1.766, -1.765, 2.576, -1.469, 1.69]])
test1 = array([[ True, False, True, False, True]], dtype=bool)
test2 = array([[1, 0, 1, 0, 1]])
Upvotes: 15
Reputation: 2696
If I do this on your raw data source, which is strings:
data = [['0', '3', 'True', '22', '1', '0', '7.25', '0'],
['1', '1', 'False', '38', '1', '0', '71.2833', '1'],
['1', '3', 'False', '26', '0', '0', '7.925', '0'],
['1', '1', 'False', '35', '1', '0', '53.1', '0'],
['0', '3', 'True', '35', '0', '0', '8.05', '0']]
data = [[eval(x) for x in y] for y in data]
..and then follow that with:
data = [[float(x) for x in y] for y in data]
# or this if you prefer:
arr = numpy.array(data)
..then the problem is solved. ..you can even do it as a one-liner (I think this makes ints, though, and floats are probably needed): numpy.array([[eval(x) for x in y] for y in data])
..I think the problem is that numpy is keeping your numeric strings as strings, and since not all of your strings are numeric, you can't do a type conversion on the whole array. Also, if you try to do a type conversion just on the parts of the array with "True" and "False", you're not really working with booleans, but with strings. ..and the only ways I know of to change that are to do the eval statement. ..well, you could do this, too:
booltext_int = {'True': 1, 'False': 2}
clean = [[float(x) if x[-1].isdigit() else booltext_int[x]
for x in y] for y in data]
..this way you avoid evals, which are inherently insecure. ..but that may not matter, since you may be using a trusted data source.
Upvotes: 2
Reputation: 133554
Using @kirelagin's idea with ast.literal_eval
>>> import ast
>>> import numpy as np
>>> arr = np.array(
[['0', '3', 'True', '22', '1', '0', '7.25', '0'],
['1', '1', 'False', '38', '1', '0', '71.2833', '1'],
['1', '3', 'False', '26', '0', '0', '7.925', '0'],
['1', '1', 'False', '35', '1', '0', '53.1', '0'],
['0', '3', 'True', '35', '0', '0', '8.05', '0']])
>>> np.vectorize(ast.literal_eval, otypes=[np.float])(arr)
array([[ 0. , 3. , 1. , 22. , 1. , 0. ,
7.25 , 0. ],
[ 1. , 1. , 0. , 38. , 1. , 0. ,
71.2833, 1. ],
[ 1. , 3. , 0. , 26. , 0. , 0. ,
7.925 , 0. ],
[ 1. , 1. , 0. , 35. , 1. , 0. ,
53.1 , 0. ],
[ 0. , 3. , 1. , 35. , 0. , 0. ,
8.05 , 0. ]])
Upvotes: 1
Reputation: 13616
Ok, the easiest way to change a type of any array to float is doing:
data.astype(float)
The issue with your array is that float('True')
is an error, because 'True'
can't be parsed as a float number. So, the best thing to do is fixing your array generation code to produce floats (or, at least, strings with valid float literals) instead of bools.
In the meantime you can use this function to fix your array:
def boolstr_to_floatstr(v):
if v == 'True':
return '1'
elif v == 'False':
return '0'
else:
return v
And finally you convert your array like this:
new_data = np.vectorize(boolstr_to_floatstr)(data).astype(float)
Upvotes: 25