Reputation: 373
as stated in the title I want to make a calculation where instead of multiplying corresponding elements I binary XOR them and then add them. Example for illustration:
EDIT: The big picture above IS the calculation but here we go: Take first row from the left [1 0 1] and first column from top matrix [1 0 0]. 1 XOR 1 = 0, 0 XOR 0 = 0, 1 XOR 0 = 1. Add them all 0 + 0 + 1 = 1. First row from the left [1 0 1], second column [0 0 0]: 1 XOR 0 = 1, 0 XOR 0 = 0, 1 XOR 0 = 1. Add them all 1 + 0 + 1 = 2. And so on
Is it possible to do that in numpy?
Upvotes: 5
Views: 1275
Reputation: 1
I was trying to find something that is simpler for me to understand and runs faster. So, I came up with this.
To try and avoid loops, we can convert this problem into a matrix multiplication problem. Matrix multiplications are highly optimized in libraries like numpy(I guess).
XOR operation followed by summing calculates number of dissimilar bits between two bit vectors. We could count this value indirectly by using bipolar vectors, i.e. converting the 1,0 bit vectors to 1,-1 vectors.
Steps:
The matrix can be seen as a stack of bit vectors of length m. You can assume these bit vectors to be the rows of M1 and the columns of M2.
Obtain the product matrix M1 x M2.
Recover the number of dissimilar bits from the values in the product matrix using the following formula.
Each bit vector has m bits. m = length of bit vector
When comparing two bit vectors,
let k = number of similar bits
=> m-k = number of dissimilar bits
let r = intermediate result = result obtained by taking dot product of two 1,-1 vectors
clearly,
r = 1*k + (-1) (m-k)
r = k -m + k
r = 2k -m --(1)
OR
k = (m+r)/2 --(2) = sum(XNOR)
Given bit vectors, we know m. We can compute r as descibed above.
We can compute k using (2)
But we need the number of dissimilar bits (m-k)
m-k = m - ((m+r)/2) = (2m-m-r)/2 = (m-r)/2
number of dissimilar bits = (m-k) = (m-r)/2 = sum(XOR)
Doing the same at the matrix level, we get:
Actual code:
import numpy as np
def matXOR_matXNOR(M1,M2):
N1 = 2*M1 - 1; N2 = 2*M2 - 1 #convert to bipolar vectors
m1, n = N1.shape #N2 is n x m2 matrix
intr = np.matmul(N1,N2) #m1 x m2 matrix -> intermediate result
xorMat = (m-intr)//2
xnorMat = (m+intr)//2
return xorMat, xnorMat
Upvotes: 0
Reputation: 16747
You can just make a combination of two loops and Numpy 1D xor-sum, like below:
import numpy as np
m1 = np.array([[1, 0, 0], [0, 0, 0], [0, 0, 0]])
m2 = np.array([[1, 0, 1], [0, 0, 1], [1, 1, 1]])
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
mr[i, j] = np.sum(m1[:, j] ^ m2[i, :])
print(mr)
Output:
[[1 2 2]
[2 1 1]
[2 3 3]]
As @MadPhysicist suggested you can use Numba JIT-optimizer (pip install numba
) to boost code above and you'll get very fast code for you operations with small memory consumption:
import numpy as np, numba
@numba.njit(cache = True)
def matxor(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
mr[i, j] = np.sum(m1[:, j] ^ m2[i, :])
return mr
m1 = np.array([[1, 0, 0], [0, 0, 0], [0, 0, 0]])
m2 = np.array([[1, 0, 1], [0, 0, 1], [1, 1, 1]])
print(matxor(m1, m2))
Also Numba code above can be boosted up to 44x times more thanks to following great improvements suggested and coded by @max9111:
import numpy as np, numba
m1 = np.random.randint(low=0, high=1,size=1_000_000).reshape(1_000,1_000)
m2 = np.random.randint(low=0, high=1,size=1_000_000).reshape(1_000,1_000)
#@Arty
@numba.njit(cache = True)
def matxor_1(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
mr[i, j] = np.sum(m1[:, j] ^ m2[i, :])
return mr
%timeit matxor_1(m1, m2)
#1.06 s ± 9.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Aligned memory acces (real transpose the ascontiguousarray is important)
@numba.njit(cache = True)
def matxor_2(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
m1_T=np.ascontiguousarray(m1.T)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
mr[i, j] = np.sum(m1_T[j, :] ^ m2[i, :])
return mr
%timeit matxor_2(m1, m2)
#312 ms ± 7.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Writing out the inner loop
@numba.njit(fastmath=True,cache = True)
def matxor_3(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
m1_T=np.ascontiguousarray(m1.T)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
acc=0
for k in range(m2.shape[1]):
acc+=m1_T[j, k] ^ m2[i, k]
mr[i, j] = acc
return mr
%timeit matxor_3(m1, m2)
#125 ms ± 3.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Parallelization
@numba.njit(fastmath=True,cache = True,parallel=True)
def matxor_4(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
m1_T=np.ascontiguousarray(m1.T)
for i in numba.prange(mr.shape[0]):
for j in range(mr.shape[1]):
acc=0
for k in range(m2.shape[1]):
acc+=m1_T[j, k] ^ m2[i, k]
mr[i, j] = acc
return mr
%timeit matxor_4(m1, m2)
#23.8 ms ± 711 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
print(np.allclose(matxor_1(m1, m2),matxor_2(m1, m2)))
#True
print(np.allclose(matxor_1(m1, m2),matxor_3(m1, m2)))
#True
print(np.allclose(matxor_1(m1, m2),matxor_4(m1, m2)))
#True
Upvotes: 1
Reputation: 6492
This is just a longer comment on Artys answer. There are a few things to speed up the Numba function.
Steps to improve performance
import numpy as np, numba
m1 = np.random.randint(low=0, high=1,size=1_000_000).reshape(1_000,1_000)
m2 = np.random.randint(low=0, high=1,size=1_000_000).reshape(1_000,1_000)
#@Arty
@numba.njit(cache = True)
def matxor_1(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
mr[i, j] = np.sum(m1[:, j] ^ m2[i, :])
return mr
%timeit matxor_1(m1, m2)
#1.06 s ± 9.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Aligned memory acces (real transpose the ascontiguousarray is important)
@numba.njit(cache = True)
def matxor_2(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
m1_T=np.ascontiguousarray(m1.T)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
mr[i, j] = np.sum(m1_T[j, :] ^ m2[i, :])
return mr
%timeit matxor_2(m1, m2)
#312 ms ± 7.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Writing out the inner loop
@numba.njit(fastmath=True,cache = True)
def matxor_3(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
m1_T=np.ascontiguousarray(m1.T)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
acc=0
for k in range(m2.shape[1]):
acc+=m1_T[j, k] ^ m2[i, k]
mr[i, j] = acc
return mr
%timeit matxor_3(m1, m2)
#125 ms ± 3.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Parallelization
@numba.njit(fastmath=True,cache = True,parallel=True)
def matxor_4(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
m1_T=np.ascontiguousarray(m1.T)
for i in numba.prange(mr.shape[0]):
for j in range(mr.shape[1]):
acc=0
for k in range(m2.shape[1]):
acc+=m1_T[j, k] ^ m2[i, k]
mr[i, j] = acc
return mr
%timeit matxor_4(m1, m2)
#23.8 ms ± 711 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
print(np.allclose(matxor_1(m1, m2),matxor_2(m1, m2)))
#True
print(np.allclose(matxor_1(m1, m2),matxor_3(m1, m2)))
#True
print(np.allclose(matxor_1(m1, m2),matxor_4(m1, m2)))
#True
Upvotes: 3
Reputation: 3368
Try this:
M1 = np.array([[1, 0, 0], [0, 0, 0], [0, 0, 0]])
M2 = np.array([[1, 0, 1], [0, 0, 1], [1, 1, 1]])
(M1 ^ M2[:,None]).sum(-1)
Output:
array([[1, 2, 2],
[2, 1, 1],
[2, 3, 3]])
EDIT
If you want to preallocate memory:
intermediary = np.empty((3,3,3), dtype=np.int32)
np.bitwise_xor(M1, M2[:,None], out=intermediary).sum(-1)
Upvotes: 4