Reputation: 17
I have a large numpy array with a size of 699720. Here is what the numpy array looks like with a shape of (4998, 140)
[[-0.11252183 -2.8272038 -3.773897 ... 0.12343082 0.92528623
0.19313742]
[-1.1008778 -3.9968398 -4.2858424 ... 0.7738197 1.1196209
-1.4362499 ]
[-0.567088 -2.5934503 -3.8742297 ... 0.32109663 0.9042267
-0.4217966 ]
...
[-1.1229693 -2.252925 -2.867628 ... -2.874136 -2.0083694
-1.8083338 ]
[-0.54770464 -1.8895451 -2.8397787 ... 1.261335 1.1504486
0.80493224]
[-1.3517791 -2.2090058 -2.5202248 ... -2.2600229 -1.577823
-0.6845309 ]]
I would like to split the numpy array into 4 different numpy arrays. the first 3 would 30% of the numpy array. e.g. numpyarray1 should be 0-30%, numpyarray2 should be 31-60%, numpyarray3 should be 61-90% and numpyarray4 should be 91-100% of the dataset.
Upvotes: 1
Views: 151
Reputation: 195
You can achieve this with numpy.split(). This function gives you quite a lot of options to split the array accordingly. Note however that it gives you a view on the original array (so no new array is created which saves memory).
See this example:
import numpy as np
arr = np.random.random((100, 100))
nr_rows = arr.shape[0]
# Get the indices for the first three sections
# You can do some fancy calculation for the first N sections
section_borders = [(i+1) * 3 * (nr_rows // 10) for i in range(3)]
# Do the splitting
arr_splits = np.split(arr, section_borders)
print([sarr.shape for sarr in arr_splits])
# --> [(30, 100), (30, 100), (30, 100), (10, 100)]
In case you want to split along columns, you can use the axis
parameter in the function to get that accordingly.
Upvotes: 1