Reputation: 2103
I need to implement in Python the formula in the image. The set B is a set of real numbers -1 <= t <= 1. Also, F_2 is a set with the elements 0 and 1. y_i is the i-th element of $y$. This formula is part of an iterative part of my program. The number of iterations is about 10K (over x) and n is 32. I tried a pre-computation approach. That is, I precomputed all f(y) and put it in a numpy array, also I computed (-1**y_i) and put it in a numpy array. See code below. But even with that, the computation takes approx 3s for the non-precomputation and n=24. When I tried to n=32 I get some memory issues due to the big arrays. My question is. Could you help me to improve this code in terms of memory and time, please?
import time
from multiprocessing import Pool
import multiprocessing
import numpy as np
import numexpr as ne
import galois
def precomputation_part(linear_layer_matrix, dim):
y_shift = np.fromfunction(lambda i,j: (i)>>j, ((2**dim), dim), dtype=np.uint32)
um=np.ones(((2**dim), dim), dtype=np.uint8);
and_list = np.bitwise_and(y_shift,um, dtype=np.uint8)
minus1_power_x_t = np.power(-1,and_list, dtype=np.int8)
fy_matrix = evaluated_f_bool_matrix(and_list, np.array(linear_layer_matrix,dtype=np.uint8))
boolean_function_0 = fy_matrix[0]
return boolean_function_0, minus1_power_x_t
def evaluated_f_bool_matrix(and_components, input_matrix):
t1 = time.time()
mc = and_components.transpose()
mal = input_matrix
return np.matmul(mal,mc,dtype=np.uint8)%2
def create_ext_component_function(dim, boolean_function, input_lst, minus1_power_x_t):
um_minus_minus1_power = ne.evaluate('1-minus1_power_x_t*input_lst')
um_minus_minus1_power_prod = um_minus_minus1_power.prod(axis=1)
boolean_function_times_prod = np.dot(boolean_function, um_minus_minus1_power_prod)
return boolean_function_times_prod*(1/((2**dim)-1))-1
if __name__ == '__main__':
dim = 24
f_matrix = [[0, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1], [0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1], [1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1], [1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 1], [1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0], [1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0], [1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1], [1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0], [1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1], [0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0], [0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0], [0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0], [1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1], [0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1], [0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1], [0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0], [0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1], [1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1], [0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1], [0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0], [1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1], [0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1], [1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1]]
boolean_function_0, minus1_power_x_t = precomputation_part(f_matrix, dim)
########The below two lines are executed several times
x = np.array([i+1 for i in range(dim)], dtype=float)
print(create_ext_component_function(dim, boolean_function_0, x, minus1_power_x_t))
Upvotes: 0
Views: 85
Reputation: 413
This is not a full answer but here are some ideas for optimization.
f(y) returns only 0 or 1, correct? If so, only compute the product where f(y)=1. When f(y)=0, the product calculation is wasted. This becomes very important if the probability of f(y)=1 is low, less so if not.
The sum is over all integers in 2^n-1, correct? That would be a very large matrix if each bit is stored as a byte and especially if n goes even larger than 32. I do not think precomputing is practical unless it is chunked. Since you have a sum, there is a logical way to chunk across y.
You can probably reuse calculation of the lower bits of y as you calculate the upper bits of y since x is constant over the sum as a divide and conquer (helpful with item 2).
I really wish we had latex here but:
This optimization is not compatible with optimization item 1.
import numpy as np
n = 32
x = np.random.uniform(-1,1,n).astype(np.float16)
prods = np.array([1-x[0],1+x[0]]).astype(np.float16)
a = np.array([-1, 1]).astype(np.float16)
for i in range(1, n):
prods = np.kron(
1+a*x[i],
prods
)
Also consider using a different float precision to save memory.
Not a speed thing, but it looks like you are multiplying a lot of numbers which could potentially underflow or overflow. Instead of calculating the product, sum the log of the element wise product and then exponentiate the total of the logs.
Log-Sum-Exp Trick to Prevent Numerical Underflow
Upvotes: 1