Reputation: 3060
It sounds like easy not i dont know how to do.
i have numpy 2d array of
X = (1783,30)
and i want to split them in batches of 64. I write the code like this.
batches = abs(len(X) / BATCH_SIZE ) + 1 // It gives 28
I am trying to do prediction of results batchwise. So i fill the batch with zeros and i overwrite them with predicted results.
predicted = []
for b in xrange(batches):
data4D = np.zeros([BATCH_SIZE,1,96,96]) #create 4D array, first value is batch_size, last number of inputs
data4DL = np.zeros([BATCH_SIZE,1,1,1]) # need to create 4D array as output, first value is batch_size, last number of outputs
data4D[0:BATCH_SIZE,:] = X[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE,:] # fill value of input xtrain
#predict
#print [(k, v[0].data.shape) for k, v in net.params.items()]
net.set_input_arrays(data4D.astype(np.float32),data4DL.astype(np.float32))
pred = net.forward()
print 'batch ', b
predicted.append(pred['ip1'])
print 'Total in Batches ', data4D.shape, batches
print 'Final Output: ', predicted
But in the last batch number 28, there are only 55 elements instead of 64 (total elements 1783), and it gives
ValueError: could not broadcast input array from shape (55,1,96,96) into shape (64,1,96,96)
What is the fix for this?
PS: the network predictione requires exact batch size is 64 to predict.
Upvotes: 8
Views: 31330
Reputation: 2067
Since Python 3.12 you can use itertools.batched
function.
For older Python version or if you're dealing with numpy arrays, you can use np.reshape to batch an array.
Say you have batch_size=2
then use batch size as second dimension when reshaping.
>>> np.arange(10).reshape(-1, batch_size)
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
First dimension will be "number of batches" and second dimension will be batch_size. You can iterate over it and it will give sequential batches.
If you have multidimensional array such as:
>>> array_2d = np.arange(30).reshape(6,5)
>>> array_2d
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]])
You can batch using second dimension again:
>>> array_2d.reshape(3, batch_size, 5)
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9]],
[[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29]]])
>>> array_2d.reshape(3, batch_size, 5)[0] # sequential items when iterating
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Note that this requires first dimension to be divisible by batch_size, so either drop remainder (e.g. array_2d[:len(array_2d) // batch_size * batch_size]
) or pad with zeros (see np.pad).
Upvotes: 3
Reputation: 3633
This can be achieved using as_strided
of numpy.
from numpy.lib.stride_tricks import as_strided
def batch_data(test, batch_size):
m,n = test.shape
S = test.itemsize
if not batch_size:
batch_size = m
count_batches = m//batch_size
# Batches which can be covered fully
test_batches = as_strided(test, shape=(count_batches, batch_size, n), strides=(batch_size*n*S,n*S,S)).copy()
covered = count_batches*batch_size
if covered < m:
rest = test[covered:,:]
rm, rn = rest.shape
mismatch = batch_size - rm
last_batch = np.vstack((rest,np.zeros((mismatch,rn)))).reshape(1,-1,n)
return np.vstack((test_batches,last_batch))
return test_batches
Upvotes: 0
Reputation: 3060
I found a SIMPLE way of solving the batches problem by generating dummy and then filling up with the necessary data.
data = np.zeros(batches*BATCH_SIZE,1,96,96)
// gives dummy 28*64,1,96,96
This code will load the data exactly 64 batch size. The last batch will have dummy zeros at the end, but thats ok :)
pred = []
for b in batches:
data4D[0:BATCH_SIZE,:] = data[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE,:]
pred = net.predict(data4D)
pred.append(pred)
output = pred[:1783] // first 1783 slice
Finally i slice out the 1783 elements from 28*64 total. This worked for me but i am sure there are many ways.
Upvotes: 1
Reputation: 13505
data4D[0:BATCH_SIZE,:]
should be data4D[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE, :]
.
Upvotes: -2
Reputation: 639
I don't really understand your question either, especially what X looks like. If you want to create sub-groups of equal size of your array, try this:
def group_list(l, group_size):
"""
:param l: list
:param group_size: size of each group
:return: Yields successive group-sized lists from l.
"""
for i in xrange(0, len(l), group_size):
yield l[i:i+group_size]
Upvotes: 16