sol
sol

Reputation: 1490

Fastest way to create and fill huge numpy 2D-array?

I have to create and fill huge (e.g. 96 Go, 72000 rows * 72000 columns) array with floats in each case that come from mathematical formulas. The array will be computed after.

import itertools, operator, time, copy, os, sys
import numpy 
from multiprocessing import Pool


def f2(x):  # more complex mathematical formulas that change according to values in *i* and *x*
    temp=[]
    for i in combine:
        temp.append(0.2*x[1]*i[1]/64.23)
    return temp

def combinations_with_replacement_counts(n, r):  #provide all combinations of r balls in n boxes
   size = n + r - 1
   for indices in itertools.combinations(range(size), n-1):
       starts = [0] + [index+1 for index in indices]
       stops = indices + (size,)
       yield tuple(map(operator.sub, stops, starts))

global combine
combine = list(combinations_with_replacement_counts(3, 60))  #here putted 60 but need 350 instead
print len(combine)
if __name__ == '__main__':
    t1=time.time()
    pool = Pool()              # start worker processes
    results = [pool.apply_async(f2, (x,)) for x in combine]
    roots = [r.get() for r in results]
    print roots [0:3]
    pool.close()
    pool.join()
    print time.time()-t1

Upvotes: 6

Views: 3016

Answers (2)

segasai
segasai

Reputation: 8548

I know that you can create shared numpy arrays that can be changed from different threads (assuming that the changed areas don't overlap). Here is the sketch of the code that you can use to do that (I saw the original idea somewhere on stackoverflow, edit: here it is https://stackoverflow.com/a/5550156/1269140 )

import multiprocessing as mp ,numpy as np, ctypes

def shared_zeros(n1, n2):
    # create a 2D numpy array which can be then changed in different threads
    shared_array_base = mp.Array(ctypes.c_double, n1 * n2)
    shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
    shared_array = shared_array.reshape(n1, n2)
    return shared_array

class singleton:
    arr = None

def dosomething(i):
    # do something with singleton.arr
    singleton.arr[i,:] = i
    return i

def main():
    singleton.arr=shared_zeros(1000,1000)
    pool = mp.Pool(16)
    pool.map(dosomething, range(1000))

if __name__=='__main__':
    main()

Upvotes: 1

shx2
shx2

Reputation: 64388

You can create an empty numpy.memmap array with the desired shape, and then use multiprocessing.Pool to populate its values. Doing it correctly would also keep memory footprint of each process in your pool relatively small.

Upvotes: 0

Related Questions