Reputation: 301
I have a huge 2D numpy array which I want to retrieve in batches.
Array shape is=60000,3072
I want to make a generator that gives me chunks out of this array like : 1000,3072
, then next 1000,3072
and so on. How can I make a generator to iterate over this array and pass me a batch of given size?
Upvotes: 2
Views: 1327
Reputation: 111
I wanted to use a generator like suggested by ChiefAmay but his 1. solution only returns whole chunks, without returning the leftover chunk at the end. Here improved solution which returns every part of the array:
def get_every_n(a, n=2):
full_chunks_len = a.shape[0] // n
for i in range(full_chunks_len):
yield a[n*i:n*(i+1)]
yield a[full_chunks_len*n:]
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12],
[13, 14, 15]])
for chunk in get_every_n(a):
print(chunk)
Output:
[[1 2 3]
[4 5 6]]
[[ 7 8 9]
[10 11 12]]
[[13 14 15]]
Upvotes: 1
Reputation: 130
consider array a
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]])
Option 1
Use a generator
def get_every_n(a, n=2):
for i in range(a.shape[0] // n):
yield a[n*i:n*(i+1)]
for sa in get_every_n(a):
print sa
[[1 2 3]
[4 5 6]]
[[ 7 8 9]
[10 11 12]]
Option 2
use reshape
and //
a.reshape(a.shape[0] // 2, -1, a.shape[1])
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
Option 3
if you wanted groups of two rather than two groups
a.reshape(-1, 2, a.shape[1])
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
Since you explicitly stated that you need a generator you can use option 1 as the appropriate reference.
Upvotes: 4
Reputation: 3722
Here's the data that you have:
import numpy as np
full_len = 5 # In your case, 60_000
cols = 3 # In your case, 3072
nd1 = np.arange(full_len*cols).reshape(full_len,cols)
Here's what you can do, to "generate" the slices:
Option 1, Using numpy.array_split():
from math import ceil
step_size = 2 # In your case, 1_000
split_list = np.array_split(nd1,ceil(full_len/step_size), axis=0)
print (split_list)
split_list
is now a list of slices into nd1
. By looping over this list, you can access the individual slices as split_list[0]
, split_list[1]
, etc, and each of these slices would be a view into nd1
, and can be used exactly as you would use any other numpy array.
Output for Option 1:
Here's the output, showing that the last slice was a bit shorter than the other regular ones:
[array([[0, 1, 2],
[3, 4, 5]]), array([[ 6, 7, 8],
[ 9, 10, 11]]), array([[12, 13, 14]])]
Option 2, by explicit slicing:
step_size = 2 # In your case, 1_000
myrange = range(0, full_len, step_size)
for r in myrange:
my_slice_array = nd1 [r:r+step_size]
print (my_slice_array.shape)
Output for Option 2:
(2, 3)
(2, 3)
(1, 3)
Note that unlike slicing lists, slicing a numpy array does not make a copy of the source array's data. It only creates a view within the slice bounds, on the existing data of the source numpy array. This applies to both Option 1, and Option 2, since both involve the creation of slices.
Upvotes: 1
Reputation: 6673
If you want something in generator way, this below solution works
import numpy
bigArray = numpy.random.rand(60000, 3072) # have used this to generate dummy array
def selectArray(m,n):
yield bigArray[m, n] # I am facing issue with giving proper slices. Please handle it yourselg.
genObject = selectArray(1000, 3072)
and you can use either for
loop or next()
to iterate over genObject
.
Note: if you are using next()
make sure you are handling StopIteration
exception.
Hope it helps.
Upvotes: 0