Reputation: 1851
I'm using a (numpy) array of integers to log potential problems with an array of data. The concept is that each error type has its own integer value, and that these are set so that
err1 = 1
err2 = 2 ** 1
err3 = 2 ** 2
...
errx = 2 ** x
This way, I figure, I can add these error types to the integer logging array, and still know what combination of errors made up that value; so if the end array has a value of 7, I know it must have be made of up 1 + 2 + 4 - ie, err1, err2, and err3.
This all seemed very clever at the time, but I now need to produce a boolean array telling me which cells have logged a given error; so, for example, if I have an error array of
test_arr = np.array(
[[1, 5, 19],
[3, 4, 12]]
)
I'd like to get the result
test_contains_err3 = np.array(
[[False, True, False],
[False, True, True]]
)
Because the value 4 has gone into making up the values 5 and 4, but not any of the others. I've developed an iterative solution for single values, but that then doesn't work well for a vectorized calculation (the actual array is quite large). Can any one please suggest something? I have a feeling that there's something simpler here that I'm not seeing.
Thanks in advance!
Upvotes: 0
Views: 122
Reputation: 3138
You should look into bitwise operations. That would allow you to encode multiple different numbers in a single joined value, for example the output of the following snippet
a = (3 << 24) + (8 << 16) + 5
print (a)
print(a>>24 & 0xf)
print(a>>16 & 0xf)
print(a & 0xf)
would look like this:
50855941
3
8
5
Now if you play around with it, you can encode as many variables as you want as long as you make sure to give each variable enough bits to cover the maximum possible value for that variable - an overflow of a single variable would corrupt your data.
Now when you need to compare which errors have been fired, you have to run a check against bitmask (location) of a particular error and you will easily know whether that particular error has been registered.
It seems to me that for your problem you would only need to know which errors have occurred and don't need to save the error codes.
You can then employ a simplified scenario where you would reserve 1 bit per error and a bit->error
map in code.
Finally, when you want to display which errors were triggered, you simply need to take the binary value of the encoded number and convert 1's to True and 0's to False.
Upvotes: 2
Reputation: 15872
I may have a solution, please check if this works for you:
>>> func = lambda x,y: bin(y)[-x] == u'1' if y >= 2**(x-1) else False
>>> func_vec = np.vectorize(func)
>>> check_for_error = 3 # to check err3 = 2**2 = 4
>>> func_vec(check_for_error, test_arr)
array([[False, True, False],
[False, True, True]])
>>> check_for_error = 4 # to check err4 = 2**3 = 8
>>> func_vec(check_for_error, test_arr)
array([[False, False, False],
[False, False, True]]) # only true for 12 (= 8 + 4)
Logic is, when a number is a Binarian, you can find which power of two is used to construct the number if you check for the index of 1
s in its binary form.
If you want to check for the errors after they are raised to the power, for example if you want to check for 8, i.e. 2**3, you can use the function as:
import numpy as np
import math
test_arr = np.array(
[[1, 5, 19],
[3, 4, 12]]
)
func = lambda x,y: bin(y)[-int(math.log(x,2))] == u'1' if y >= x else False
func_vec = np.vectorize(func)
check_for_error = 8
print(func_vec(check_for_error, test_arr))
Output:
[[False False False]
[False False True]] # checking for 8. 8 found in 12 (= 8 + 4)
EDIT: A method for finding out all the errors that make up the number:
>>> test_arr =
np.array([[ 1, 5, 19],
[ 3, 4, 12],
[ 7, 27, 59]])
>>> func = lambda x: ','.join([str(2**i) for i,j in enumerate(reversed(bin(x))) if j==u'1'])
>>> func_vec = np.vectorize(func)
>>> func_vec(test_arr)
array([['1', '1,4', '1,2,16'],
['1,2', '4', '4,8'],
['1,2,4', '1,2,8,16', '1,2,8,16,32']], dtype='<U11')
Upvotes: 1