Hadrien Berthier
Hadrien Berthier

Reputation: 305

Python loop over batch of files

I want to loop over a batch of files in order to get 32 images of each sub-directory at a time (I cant load all images due to memory) e.g load img 1-32 of every dir use them and then load img 33-64 then 65-96 etc

My directory:

Rootdir
  - dir1
    - img 1
    - img 2
    - img...
  - dir2
    - img 5000001
    - img 5000002
    - img...
  - dir3
    - img 10000001
    - img 10000002
    - img...

So I would need to load img1,2,..,32, 5000001,...5000032, 1000001,...10000032 at first loop then img33,34,..,64, 5000033,...5000064, 1000033,...10000064 at second loop

Is there a way to do this properly?

I am trying using os.walk and it allows me to loop over my directory but I don't see how I can adapt this loop to my required 32 batches?

for dirName, subdirList, fileList in os.walk(rootdir):
      print('Found directory: %s' % dirName)
      for fname in sorted(fileList):
        img_path = os.path.join(dirName, fname)
        try:
          img = load_img(img_path, target_size=None)
          imgs.append(img)
        except Exception as e:
          print(str(e), fname, i)
      #do something on imgs

EDIT

all of your comment get me stuff like that:

dir1/img1.jpg to dir1/img32.jpg then dir1/img33.jpg to dir1/img64.jpg then ...

then dir2/img1.jpg to dir1/img32.jpg then dir2/img33.jpg to dir2/img64.jpg then ...

then dir3/img1.jpg to dir3/img32.jpg then dir3/img33.jpg to dir3/img64.jpg :(

What I'm trying to achieve is:

Files of dir1 numero 1 to 32 + files of dir2 numero 1 to 32 + files of dir3 numero 1 to 32 then

Files of dir1 numero 33 to 64 + files of dir2 numero 33 to 64 + files of dir3 numero 33 to 64 in the same loop

Upvotes: 1

Views: 2427

Answers (4)

Hadrien Berthier
Hadrien Berthier

Reputation: 305

Okay I found a way, not the most beautiful but here it is: I use a set to know which file I already seen and I continue if I'm on it so it doesn't count.

number_of_directory = 17
batch_size = 32
seen = set()
for overall_count in pbar(range(data_number // (batch_size * number_of_directory))):
    imgs = []
    for dirName, subdirList, fileList in os.walk(rootdir):
        count = 0
        for fname in sorted(fileList):
          if fname in seen:
            continue
          if count == batch_size:
            break
          img_path = os.path.join(dirName, fname)
          try:
            img = cv2.imread(img_path, cv2.IMREAD_COLOR)
            img = cv2.resize(img, (img_width, img_height))
            imgs.append(np.array(img))
          except Exception as e:
            print(str(e), fname)
          seen.add(fname)
          count +=1
    #Do something with images

Upvotes: 0

Kunal Mukherjee
Kunal Mukherjee

Reputation: 5853

os.walk already returns a generator which will yield a 3-tuple (dirpath, dirnames, filenames) values on fly, so you just need to yield the slice of the filenames array in batches of 32.


This is an example:

import os

# Your root directory path
rootdir = r"Root"

#Your batch size
batch_size = 32

def walk_dirs(directory, batch_size):
    walk_dirs_generator = os.walk(directory)
    for dirname, subdirectories, filenames in walk_dirs_generator:
        for i in range(0, len(filenames), batch_size):
            # slice the filenames list 0-31, 32-64 and so on
            yield [os.path.join(dirname, filename) for filename in filenames[i:i+batch_size]]

# Finally iterate over the walk_dirs function which itself returns a generator
for file_name_batch in walk_dirs(rootdir, batch_size):
    for file_name in file_name_batch:
        # Do some processing on the batch now
        print (file_name)
        pass

Upvotes: 3

Pitto
Pitto

Reputation: 8579

What about always using the same img list and process it as soon as you have 32 images?

for dirName, subdirList, fileList in os.walk('c:\\Java\\'):
      print('Found directory: %s' % dirName)
      for fname in sorted(fileList):
        img_path = os.path.join(dirName, fname)
        try:
          img = load_img(img_path, target_size=None)
          imgs.append(img)
          if len(imgs) == 32:
            print("Doing what I have to with current imgs list (add your function here)")
            img = [] # cleaning img list
        except Exception as e:
          print(str(e))
      #do something on imgs

if you need to keep track of all the previous lists you can simply copy the list content over.

Let me know if you want that implementation too.

Upvotes: 0

Georges Lorré
Georges Lorré

Reputation: 443

You could take a look at os.walk()

EDIT: simple counter example

counter = 0
for x in mylist:
    # do something with x 
    todo_list.append(x)
    counter += 1
    if counter % 32 == 0: 
        # do something with todo list
        todo_list = [] # empty todo list for next batch

Upvotes: 0

Related Questions