Reputation: 305
I want to loop over a batch of files in order to get 32 images of each sub-directory at a time (I cant load all images due to memory) e.g load img 1-32 of every dir use them and then load img 33-64 then 65-96 etc
My directory:
Rootdir
- dir1
- img 1
- img 2
- img...
- dir2
- img 5000001
- img 5000002
- img...
- dir3
- img 10000001
- img 10000002
- img...
So I would need to load img1,2,..,32, 5000001,...5000032, 1000001,...10000032 at first loop then img33,34,..,64, 5000033,...5000064, 1000033,...10000064 at second loop
Is there a way to do this properly?
I am trying using os.walk and it allows me to loop over my directory but I don't see how I can adapt this loop to my required 32 batches?
for dirName, subdirList, fileList in os.walk(rootdir):
print('Found directory: %s' % dirName)
for fname in sorted(fileList):
img_path = os.path.join(dirName, fname)
try:
img = load_img(img_path, target_size=None)
imgs.append(img)
except Exception as e:
print(str(e), fname, i)
#do something on imgs
EDIT
all of your comment get me stuff like that:
dir1/img1.jpg to dir1/img32.jpg then dir1/img33.jpg to dir1/img64.jpg then ...
then dir2/img1.jpg to dir1/img32.jpg then dir2/img33.jpg to dir2/img64.jpg then ...
then dir3/img1.jpg to dir3/img32.jpg then dir3/img33.jpg to dir3/img64.jpg :(
What I'm trying to achieve is:
Files of dir1 numero 1 to 32 + files of dir2 numero 1 to 32 + files of dir3 numero 1 to 32 then
Files of dir1 numero 33 to 64 + files of dir2 numero 33 to 64 + files of dir3 numero 33 to 64 in the same loop
Upvotes: 1
Views: 2427
Reputation: 305
Okay I found a way, not the most beautiful but here it is: I use a set to know which file I already seen and I continue if I'm on it so it doesn't count.
number_of_directory = 17
batch_size = 32
seen = set()
for overall_count in pbar(range(data_number // (batch_size * number_of_directory))):
imgs = []
for dirName, subdirList, fileList in os.walk(rootdir):
count = 0
for fname in sorted(fileList):
if fname in seen:
continue
if count == batch_size:
break
img_path = os.path.join(dirName, fname)
try:
img = cv2.imread(img_path, cv2.IMREAD_COLOR)
img = cv2.resize(img, (img_width, img_height))
imgs.append(np.array(img))
except Exception as e:
print(str(e), fname)
seen.add(fname)
count +=1
#Do something with images
Upvotes: 0
Reputation: 5853
os.walk already returns a generator which will yield a 3-tuple (dirpath, dirnames, filenames) values on fly, so you just need to yield the slice of the filenames array in batches of 32.
This is an example:
import os
# Your root directory path
rootdir = r"Root"
#Your batch size
batch_size = 32
def walk_dirs(directory, batch_size):
walk_dirs_generator = os.walk(directory)
for dirname, subdirectories, filenames in walk_dirs_generator:
for i in range(0, len(filenames), batch_size):
# slice the filenames list 0-31, 32-64 and so on
yield [os.path.join(dirname, filename) for filename in filenames[i:i+batch_size]]
# Finally iterate over the walk_dirs function which itself returns a generator
for file_name_batch in walk_dirs(rootdir, batch_size):
for file_name in file_name_batch:
# Do some processing on the batch now
print (file_name)
pass
Upvotes: 3
Reputation: 8579
What about always using the same img list and process it as soon as you have 32 images?
for dirName, subdirList, fileList in os.walk('c:\\Java\\'):
print('Found directory: %s' % dirName)
for fname in sorted(fileList):
img_path = os.path.join(dirName, fname)
try:
img = load_img(img_path, target_size=None)
imgs.append(img)
if len(imgs) == 32:
print("Doing what I have to with current imgs list (add your function here)")
img = [] # cleaning img list
except Exception as e:
print(str(e))
#do something on imgs
if you need to keep track of all the previous lists you can simply copy the list content over.
Let me know if you want that implementation too.
Upvotes: 0
Reputation: 443
You could take a look at os.walk()
EDIT: simple counter example
counter = 0
for x in mylist:
# do something with x
todo_list.append(x)
counter += 1
if counter % 32 == 0:
# do something with todo list
todo_list = [] # empty todo list for next batch
Upvotes: 0