Reputation: 19
I have a dataset of images, each containing a 1 to 5letter word. I want to use deep learning to classify the characters that make up the word in each image. The labels for these images are formatted as follows:
totalcharacter_indexoffirstchar_indexofsecondchar_.._indexoflastchar
I'm trying to load these images into TensorFlow pipelines to reduce complexity due to memory constraints. Below is my code for loading and processing images and labels from directory:
def process_img(file_path):
label = get_label(file_path)
image = tf.io.read_file(file_path)
image = tf.image.decode_png(image, channels=1)
image = tf.image.convert_image_dtype(image, tf.float32)
target_shape = [695, 1204]
image = tf.image.resize_with_crop_or_pad(image, target_shape[0], target_shape[1])
# Encode the label
encoded_label = tf.py_function(func=encode_label, inp=[label], Tout=tf.float32)
encoded_label.set_shape([5, len(urdu_alphabets)])
return image, encoded_label
input_dir = '/kaggle/input/dataset/Data/*'
images_ds = tf.data.Dataset.list_files(input_dir, shuffle=True)
train_count = int(tf.math.round(len(images_ds) * 0.8))
train_ds = images_ds.take(train_count)
test_ds = images_ds.skip(train_count)
train_ds = train_ds.map(process_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
test_ds = test_ds.map(process_img, num_parallel_calls=tf.data.experimental.AUTOTUNE)
test_ds = test_ds.batch(32)
train_ds = train_ds.cache()
test_ds = test_ds.cache()
train_ds = train_ds.shuffle(len(train_ds))
test_ds = test_ds.prefetch(tf.data.AUTOTUNE)
print(train_ds)
print(test_ds)
The train_ds looks like this:
<_PrefetchDataset element_spec=(TensorSpec(shape=(None, 695, 1204, 1), dtype=tf.float32, name=None), TensorSpec(shape=(None, 5, 39), dtype=tf.float32, name=None))>
Now, I want to apply simple augmentations on the images such as rotation, shear, erosion, and dilation. I initially used the following function:
def augment(image, label):
image = tf.image.random_flip_left_right(image)
image = tf.image.random_flip_up_down(image)
image = tf.keras.preprocessing.image.random_rotation(image, rg=15, row_axis=0, col_axis=1, channel_axis=2, fill_mode='nearest', cval=0.0, interpolation_order=1)
image = tf.image.random_zoom(image, [0.85, 0.85])
image = tf.image.random_shear(image, 0.3)
image = tf.image.random_shift(image, 0.1, 0.1)
return image, label
train_augmented_ds = train_ds.map(augment, num_parallel_calls=tf.data.AUTOTUNE)
train_augmented_ds = train_augmented_ds.prefetch(buffer_size=tf.data.AUTOTUNE)
However, many of these functions in tf.image are deprecated. How can I apply these augmentations on images in a TensorFlow pipeline in an efficient way?
Note: I can perform these augmentations by loading images without TensorFlow pipelines using NumPy arrays, but my dataset is very large (1.1 million images), so I need an efficient way to do this.
Upvotes: 0
Views: 61
Reputation: 1
You can use ImageDataGenerator
:
from keras.preprocessing.image import ImageDataGenerator
Upvotes: 0
Reputation: 1833
You can use layers, e.g. the RandomRotation layer. I think every operation you listed expect for random shear is available as a TensorFlow layer now. Random shear is available as a layer in the keras-cv package.
You could add these layers at the beginning of your model directly, or create a separate model with these preprocessing layers, which you can add as a sub-model. By default, these augmentations are only applied in training, so your test set (or training set in model.evaluate(train)
) will not be affected.
Upvotes: 0