이준혁
이준혁

Reputation: 277

is there any impact on model accuracy of batch learning while learning with tensorflow cnn

im dealing with big amount of image
however, when decompressed with python pillow, python stops due to lack of memory
so I just sliced image and trained model
for example

  1. full data: 1.jpg, 2.jpg~100.jpg
    =>learn 50 epoch
  2. sliced data: batch 1->1.jpg~10.jpg, batch 2->11.jpg~20.jpg.....batch 10
    =>learn 50 epoch batch 1, learn 50 epoch batch 2....

is there any difference in accuracy of model?
thanks

Upvotes: 0

Views: 171

Answers (1)

Gerry P
Gerry P

Reputation: 8092

Image data tends to take up a lot of memory. Typically if you try to process all the images as one big data set you will get a memory exhaust error. To deal with this the data is accumulated as "batches" of images and sequentially provided as the input to the model during training. Generators are typically employed to accomplish this. For example assume you have 10,000 training images of say 300 by 300 rgb pixels. That is a vary large amount of data. So instead of trying to process all 10,000 images at once, you can break it up in batches. batch_size can be used to define how many images are processed and stored in memory at any one time. For example if you set a batch_size of 50, it would take 200 sequential batches to process all the images for one training epoch. 200 is what is called steps_per_epoch in model.fit. You can create your own generator if you wish but keras provides several API's that can do that for you. Documentation for that is here. I prefer to use ImageDataGenerator.flow_from_directory to provide the required batch generation but also to provide image augmentation. Batch size can have an impact on model performance to a degree as well as impacting training duration. Small batch sizes typical make the training duration longer and result in more variance in the metrics at each epoch but have some advantage with respect to avoiding getting stuck in local minima. Larger batch sizes tend to reduce overall training duration. I usually do a few runs with various batch sizes to see if the model performance changes significantly.

Upvotes: 1

Related Questions