Reputation: 2978
I have a numpy array that contains 813698 rows:
len(df_numpy)
Out[55]: 813698
I want to loop through this array using mini batches of 5000.
mini_batch = 5000
i = 0
for each batch in df_numpy:
mysubset = df_numpy[i:mini_batch+i]
# …
i = i + mini_batch
The problem is that (len(df_numpy)-1)/mini_batch
is not an integer. So, the last mini batch is not equal to 5000
.
How can I loop though df_numpy
so that all records of df_numpy
are included?
Upvotes: 0
Views: 1256
Reputation: 13743
This code should get the job done:
mini_batch = 5000
for first in range(0, len(df_numpy), mini_batch):
mysubset = df_numpy[first:first+mini_batch]
# ...
In [2]: import numpy as np
In [3]: df_numpy = np.arange(13)
In [4]: mini_batch = 5
In [5]: for first in range(0, len(df_numpy), mini_batch):
...: mysubset = df_numpy[first:first+mini_batch]
...: print(mysubset)
[0 1 2 3 4]
[5 6 7 8 9]
[10 11 12]
Upvotes: 2