Reputation: 4725
There is this How do you split a list into evenly sized chunks? for splitting an array into chunks. Is there anyway to do this more efficiently for giant arrays using Numpy?
Upvotes: 113
Views: 206141
Reputation: 888
np.array_split will try to split "evenly", for example, if x.shape is 10, sections is 3, you will get splits with shape [3, 3, 2, 2] instead of [3, 3, 3, 1], a workaround is using spaced indices like snippet below
import math
import numpy as np
def split_evenly(x, chunk_size, axis=0):
return np.array_split(x, math.ceil(x.shape[axis] / chunk_size), axis=axis)
def split_reminder(x, chunk_size, axis=0):
indices = np.arange(chunk_size, x.shape[axis], chunk_size)
return np.array_split(x, indices, axis)
x = np.arange(10)
chunk_size = 3
print([i.shape[0] for i in split_evenly(x, chunk_size, 0)])
print([i.shape[0] for i in split_reminder(x, chunk_size, 0)])
# [3, 3, 2, 2]
# [3, 3, 3, 1]
Upvotes: 3
Reputation: 3623
This can be achieved using as_strided
of numpy. I have put a spin to answer by assuming that if chunk size is not a factor of total number of rows, then rest of the rows in the last batch will be filled with zeros.
from numpy.lib.stride_tricks import as_strided
def batch_data(test, chunk_count):
m,n = test.shape
S = test.itemsize
if not chunk_count:
chunk_count = 1
batch_size = m//chunk_count
# Batches which can be covered fully
test_batches = as_strided(test, shape=(chunk_count, batch_size, n), strides=(batch_size*n*S,n*S,S)).copy()
covered = chunk_count*batch_size
if covered < m:
rest = test[covered:,:]
rm, rn = rest.shape
mismatch = batch_size - rm
last_batch = np.vstack((rest,np.zeros((mismatch,rn)))).reshape(1,-1,n)
return np.vstack((test_batches,last_batch))
return test_batches
This is based on my answer https://stackoverflow.com/a/68238815/5462372.
Upvotes: 0
Reputation: 35636
How about this? Here you split the array using the length you want to have.
a = np.random.randint(0,10,[4,4])
a
Out[27]:
array([[1, 5, 8, 7],
[3, 2, 4, 0],
[7, 7, 6, 2],
[7, 4, 3, 0]])
a[0:2,:]
Out[28]:
array([[1, 5, 8, 7],
[3, 2, 4, 0]])
a[2:4,:]
Out[29]:
array([[7, 7, 6, 2],
[7, 4, 3, 0]])
Upvotes: 0
Reputation: 15345
Just some examples on usage of array_split
, split
, hsplit
and vsplit
:
n [9]: a = np.random.randint(0,10,[4,4])
In [10]: a
Out[10]:
array([[2, 2, 7, 1],
[5, 0, 3, 1],
[2, 9, 8, 8],
[5, 7, 7, 6]])
Some examples on using array_split
:
If you give an array or list as second argument you basically give the indices (before) which to 'cut'
# split rows into 0|1 2|3
In [4]: np.array_split(a, [1,3])
Out[4]:
[array([[2, 2, 7, 1]]),
array([[5, 0, 3, 1],
[2, 9, 8, 8]]),
array([[5, 7, 7, 6]])]
# split columns into 0| 1 2 3
In [5]: np.array_split(a, [1], axis=1)
Out[5]:
[array([[2],
[5],
[2],
[5]]),
array([[2, 7, 1],
[0, 3, 1],
[9, 8, 8],
[7, 7, 6]])]
An integer as second arg. specifies the number of equal chunks:
In [6]: np.array_split(a, 2, axis=1)
Out[6]:
[array([[2, 2],
[5, 0],
[2, 9],
[5, 7]]),
array([[7, 1],
[3, 1],
[8, 8],
[7, 6]])]
split
works the same but raises an exception if an equal split is not possible
In addition to array_split
you can use shortcuts vsplit
and hsplit
.
vsplit
and hsplit
are pretty much self-explanatry:
In [11]: np.vsplit(a, 2)
Out[11]:
[array([[2, 2, 7, 1],
[5, 0, 3, 1]]),
array([[2, 9, 8, 8],
[5, 7, 7, 6]])]
In [12]: np.hsplit(a, 2)
Out[12]:
[array([[2, 2],
[5, 0],
[2, 9],
[5, 7]]),
array([[7, 1],
[3, 1],
[8, 8],
[7, 6]])]
Upvotes: 28
Reputation: 22509
Try numpy.array_split
.
From the documentation:
>>> x = np.arange(8.0)
>>> np.array_split(x, 3)
[array([ 0., 1., 2.]), array([ 3., 4., 5.]), array([ 6., 7.])]
Identical to numpy.split
, but won't raise an exception if the groups aren't equal length.
If number of chunks > len(array) you get blank arrays nested inside, to address that - if your split array is saved in a
, then you can remove empty arrays by:
[x for x in a if x.size > 0]
Just save that back in a
if you wish.
Upvotes: 161
Reputation: 67427
Not quite an answer, but a long comment with nice formatting of code to the other (correct) answers. If you try the following, you will see that what you are getting are views of the original array, not copies, and that was not the case for the accepted answer in the question you link. Be aware of the possible side effects!
>>> x = np.arange(9.0)
>>> a,b,c = np.split(x, 3)
>>> a
array([ 0., 1., 2.])
>>> a[1] = 8
>>> a
array([ 0., 8., 2.])
>>> x
array([ 0., 8., 2., 3., 4., 5., 6., 7., 8.])
>>> def chunks(l, n):
... """ Yield successive n-sized chunks from l.
... """
... for i in xrange(0, len(l), n):
... yield l[i:i+n]
...
>>> l = range(9)
>>> a,b,c = chunks(l, 3)
>>> a
[0, 1, 2]
>>> a[1] = 8
>>> a
[0, 8, 2]
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8]
Upvotes: 10
Reputation: 309841
I believe that you're looking for numpy.split
or possibly numpy.array_split
if the number of sections doesn't need to divide the size of the array properly.
Upvotes: 10