Reputation: 6122
I am making a 3d array of zeros and then filling it. But, due to the size of the numpy array it runs into memory issues even with 64 gb ram. Am i doing it wrong?
X_train_one_hot shape is (47827, 30, 20000) and encInput is of shape (47827, 30, 200)
X_train_one_hot_shifted = np.zeros((X_train_one_hot.shape[0], 30, 20200))
#X_train_one_hot.shape[0] = 48000
for j in range(0, X_train_one_hot.shape[0]):
current = np.zeros((30, 20000))
current[0][0] = 1
current[1:] = X_train_one_hot[j][0:29]
# print(current.shape, encInput[i].shape)
combined = np.concatenate((current,encInput[j]), axis=1)
X_train_one_hot_shifted[j] = combined
Any ideas to reduce memory consumption? Another interesting thing is since the shape of X_train_one_hot is also almost same, but that does not throw any error.
EDIT : The program gets killed in the for loop with the error message :
TERM_MEMLIMIT: job killed after reaching LSF memory usage limit.
Also, most of the array is sparse since X_train_one_hot a one_hot encoding of 20000 size
Upvotes: 2
Views: 1114
Reputation: 10221
Imtinan Azhar is correct. You simply do not have enough RAM to hold the array.
You have a few options.
1) You seem to have a very sparse matrix even though the size is large. So you can try to use one of the sparse matrix representation from Scipy.
If you are throwing the array into a library package such as Scikit-Learn or one of those Deep Learning libraries, this will likely not work.
2) Most DL libraries don't need you to load all your data at once. You can prepare your data in batches - create this matrix in batch and save it out to file (preferably using a sparse matrix representation). Then use a data generator to feed your algorithm, or manually load in batches of your data for your algorithm.
3) If these are all not possible, then you can try to memory map the array using Numpy's memmap. Some further examples can be found here.
4) Another option is to use dask and manually get slices of the data when necessary.
Personally, I would go with option 2, or 1 if your algorithms that take in the matrix can handle (or be modified to handle) sparse matrices.
Upvotes: 3
Reputation: 1753
Lets see your X_train_one_hot_shifted.shape is (48000,30,20200) that is 28983162000 floats.
28983162000*8
gives you the memory consumption for this array in bytes.
Which is 231865296000
bytes
Lets simplify this
231865296000 b
226430953.125 kb
221123.977661 mb
215.941384435 gb
You need 215Gb of RAM to fit X_train_one_hot_shifted into your RAM, i think the shape 20200 is a typo, look it up
Upvotes: 1