pbu
pbu

Reputation: 3060

How to split numpy array in batches?

It sounds like easy not i dont know how to do.

i have numpy 2d array of

X = (1783,30)

and i want to split them in batches of 64. I write the code like this.

batches = abs(len(X) / BATCH_SIZE ) + 1  // It gives 28

I am trying to do prediction of results batchwise. So i fill the batch with zeros and i overwrite them with predicted results.

predicted = []

for b in xrange(batches): 

 data4D = np.zeros([BATCH_SIZE,1,96,96]) #create 4D array, first value is batch_size, last number of inputs
 data4DL = np.zeros([BATCH_SIZE,1,1,1]) # need to create 4D array as output, first value is  batch_size, last number of outputs
 data4D[0:BATCH_SIZE,:] = X[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE,:] # fill value of input xtrain

 #predict
 #print [(k, v[0].data.shape) for k, v in net.params.items()]
 net.set_input_arrays(data4D.astype(np.float32),data4DL.astype(np.float32))
 pred = net.forward()
 print 'batch ', b
 predicted.append(pred['ip1'])

print 'Total in Batches ', data4D.shape, batches
print 'Final Output: ', predicted

But in the last batch number 28, there are only 55 elements instead of 64 (total elements 1783), and it gives

ValueError: could not broadcast input array from shape (55,1,96,96) into shape (64,1,96,96)

What is the fix for this?

PS: the network predictione requires exact batch size is 64 to predict.

Upvotes: 8

Views: 31330

Answers (5)

Tautvydas
Tautvydas

Reputation: 2067

Since Python 3.12 you can use itertools.batched function.

For older Python version or if you're dealing with numpy arrays, you can use np.reshape to batch an array.

Say you have batch_size=2 then use batch size as second dimension when reshaping.

>>> np.arange(10).reshape(-1, batch_size)
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

First dimension will be "number of batches" and second dimension will be batch_size. You can iterate over it and it will give sequential batches.

If you have multidimensional array such as:

>>> array_2d = np.arange(30).reshape(6,5)
>>> array_2d
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29]])

You can batch using second dimension again:

>>> array_2d.reshape(3, batch_size, 5)
array([[[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9]],

       [[10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]],

       [[20, 21, 22, 23, 24],
        [25, 26, 27, 28, 29]]])

>>> array_2d.reshape(3, batch_size, 5)[0]  # sequential items when iterating
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

Note that this requires first dimension to be divisible by batch_size, so either drop remainder (e.g. array_2d[:len(array_2d) // batch_size * batch_size]) or pad with zeros (see np.pad).

Upvotes: 3

MSS
MSS

Reputation: 3633

This can be achieved using as_strided of numpy.

from numpy.lib.stride_tricks import as_strided
def batch_data(test, batch_size):
    m,n = test.shape
    S = test.itemsize
    if not batch_size:
        batch_size = m
    count_batches = m//batch_size
    # Batches which can be covered fully
    test_batches = as_strided(test, shape=(count_batches, batch_size, n), strides=(batch_size*n*S,n*S,S)).copy()
    covered = count_batches*batch_size
    if covered < m:
        rest = test[covered:,:]
        rm, rn = rest.shape
        mismatch = batch_size - rm
        last_batch = np.vstack((rest,np.zeros((mismatch,rn)))).reshape(1,-1,n)
        return np.vstack((test_batches,last_batch))
    return test_batches

Upvotes: 0

pbu
pbu

Reputation: 3060

I found a SIMPLE way of solving the batches problem by generating dummy and then filling up with the necessary data.

data = np.zeros(batches*BATCH_SIZE,1,96,96)
// gives dummy  28*64,1,96,96

This code will load the data exactly 64 batch size. The last batch will have dummy zeros at the end, but thats ok :)

pred = []
for b in batches:
 data4D[0:BATCH_SIZE,:] = data[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE,:]
 pred = net.predict(data4D)
 pred.append(pred)

output =  pred[:1783] // first 1783 slice

Finally i slice out the 1783 elements from 28*64 total. This worked for me but i am sure there are many ways.

Upvotes: 1

Peter
Peter

Reputation: 13505

data4D[0:BATCH_SIZE,:] should be data4D[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE, :].

Upvotes: -2

poli_g
poli_g

Reputation: 639

I don't really understand your question either, especially what X looks like. If you want to create sub-groups of equal size of your array, try this:

def group_list(l, group_size):
    """
    :param l:           list
    :param group_size:  size of each group
    :return:            Yields successive group-sized lists from l.
    """
    for i in xrange(0, len(l), group_size):
        yield l[i:i+group_size]

Upvotes: 16

Related Questions