Peter Smith
Peter Smith

Reputation: 17

Split a large numpy array into multiple numpy arrays?

I have a large numpy array with a size of 699720. Here is what the numpy array looks like with a shape of (4998, 140)

[[-0.11252183 -2.8272038  -3.773897   ...  0.12343082  0.92528623
0.19313742]
[-1.1008778  -3.9968398  -4.2858424  ...  0.7738197   1.1196209
-1.4362499 ]
[-0.567088   -2.5934503  -3.8742297  ...  0.32109663  0.9042267
-0.4217966 ]
...
[-1.1229693  -2.252925   -2.867628   ... -2.874136   -2.0083694
-1.8083338 ]
[-0.54770464 -1.8895451  -2.8397787  ...  1.261335    1.1504486
0.80493224]
[-1.3517791  -2.2090058  -2.5202248  ... -2.2600229  -1.577823
-0.6845309 ]]

I would like to split the numpy array into 4 different numpy arrays. the first 3 would 30% of the numpy array. e.g. numpyarray1 should be 0-30%, numpyarray2 should be 31-60%, numpyarray3 should be 61-90% and numpyarray4 should be 91-100% of the dataset.

Upvotes: 1

Views: 151

Answers (1)

FabianGD
FabianGD

Reputation: 195

You can achieve this with numpy.split(). This function gives you quite a lot of options to split the array accordingly. Note however that it gives you a view on the original array (so no new array is created which saves memory).

See this example:

import numpy as np

arr = np.random.random((100, 100))
nr_rows = arr.shape[0]

# Get the indices for the first three sections
# You can do some fancy calculation for the first N sections
section_borders = [(i+1) * 3 * (nr_rows // 10) for i in range(3)]

# Do the splitting
arr_splits = np.split(arr, section_borders)

print([sarr.shape for sarr in arr_splits])
# --> [(30, 100), (30, 100), (30, 100), (10, 100)]

In case you want to split along columns, you can use the axis parameter in the function to get that accordingly.

Upvotes: 1

Related Questions