Reputation: 889
I built a simple CNN model and it raised below errors:
Epoch 1/10
235/235 [==============================] - ETA: 0s - loss: 540.2643 - accuracy: 0.4358
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-14-ab88232c98aa> in <module>()
15 train_ds,
16 validation_data=val_ds,
---> 17 epochs=epochs
18 )
7 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
InvalidArgumentError: Unknown image file format. One of JPEG, PNG, GIF, BMP required.
[[{{node decode_image/DecodeImage}}]]
[[IteratorGetNext]] [Op:__inference_test_function_2924]
Function call stack:
test_function
The code I wrote is quite simple and standard. Most of them are just directly copied from the official website. It raised this error before the first epoch finish. I am pretty sure that the images are all png files. The train folder does not contain anything like text, code, except imgages. I am using Colab. The version of tensorlfow
is 2.5.0. Appreciate for any help.
data_dir = './train'
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
subset='training',
validation_split=0.2,
batch_size=batch_size,
seed=42
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
subset='validation',
validation_split=0.2,
batch_size=batch_size,
seed=42
)
model = Sequential([
layers.InputLayer(input_shape=(image_size, image_size, 3)),
layers.Conv2D(32, 3, activation='relu'),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(
optimizer=optimizer,
loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
Upvotes: 10
Views: 20694
Reputation: 1
Lescruel's answer saved my butt! I made a small amendment in case you automatically want to remove the images that are not usable:
from pathlib import Path
import imghdr
import os
# Define the directory containing the images
# List of valid image extensions
image_extensions = [".png", ".jpg", ".jpeg", ".bmp", ".gif"]
# Image types accepted by TensorFlow
img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"]
# Loop through all files in the directory and subdirectories
for filepath in Path(extract_path).rglob("*"):
# Check if it's a file before proceeding
if filepath.is_file():
# Check if the file has a valid image extension
if filepath.suffix.lower() in image_extensions:
# Check the actual image type
img_type = imghdr.what(filepath)
if img_type is None:
print(f"{filepath} is not an image. Deleting...")
os.remove(filepath) # Delete the file
elif img_type not in img_type_accepted_by_tf:
print(f"{filepath} is a {img_type}, not accepted by TensorFlow. Deleting...")
os.remove(filepath) # Delete the file
else:
# If the file does not have a valid extension
print(f"{filepath} is not a recognized image type. Deleting...")
os.remove(filepath) # Delete the file
Upvotes: 0
Reputation: 6406
As stated in other answers you can use the imghdr
built-in Python module to guess the image format and assert that it is not corrupted and that is matches the file extension.
However, starting from Python 3.11 imghdr
is deprecated (PEP 594) and will be removed in Python 3.13 due to its limited number of formats supported and its limited functionality.
three alternatives are listed in the PEP: filetype
, puremagic
and python-magic
.
Here is an example use with filetype
:
from pathlib import Path
import filetype
# RFC image file extensions supported by TensorFlow
img_exts = {"png", "jpg", "gif", "bmp"}
path = Path("train")
for file in path.iterdir():
if file.is_dir():
continue
ext = filetype.guess_extension(file)
if ext is None:
print(f"'{file}': extension cannot be guessed from content")
elif ext not in img_exts:
print(f"'{file}': not a supported image file")
Upvotes: 1
Reputation: 1
I had the same issue. I went through a lot of answers above and none of them worked for me. So, I wrote the training loop inside the try except blocks and the batch that has these problems will be skipped. Please note: this is not a direct solution.
iterator = iter(preprocessed_train_dataset)
max_iterations = len(preprocessed_train_dataset)
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
# Iterate over the batches of the dataset.
i = 0
while i < max_iterations:
print("Currently running {} batch".format(i))
try:
i = i + 1
x_batch_train, y_batch_train = next(iterator)
with tf.GradientTape() as tape:
logits = model(x_batch_train, training=True)
loss_value = loss_fn(y_batch_train, logits)
grads = tape.gradient(loss_value, model.trainable_weights)
optimizer.apply_gradients(zip(grads, model.trainable_weights))
# Log every 200 batches.
if i % 200 == 0:
print(
"Training loss (for one batch) at step %d: %.4f"
% (i, float(loss_value))
)
print("Seen so far: %s samples" % ((i + 1) * batch_size))
train_acc = train_acc_metric.result()
print("Training acc over epoch: %.4f" % (float(train_acc),))
# Reset training metrics at the end of each epoch
train_acc_metric.reset_states()
for x_batch_val, y_batch_val in preprocessed_val_dataset:
val_logits = model(x_batch_val, training=False)
# Update val metrics
val_acc_metric.update_state(y_batch_val, val_logits)
val_acc = val_acc_metric.result()
val_acc_metric.reset_states()
print("Validation acc: %.4f" % (float(val_acc),))
except Exception as e:
continue
# Evaluate the model
test_loss, test_accuracy = model.evaluate(preprocessed_test_dataset)
Upvotes: 0
Reputation: 84
TensorFlow has some strictness when dealing with image formats. This should guide in deleting the bad images. Some times your data set may even run well with, for instance Torch but will generate a format error with Tf. Nonetheless, it is best practice to always carryout preprocessing on the images to ensure a robust, safe and standard model.
from pathlib import Path
import imghdr
from pathlib import Path
import imghdr
img_link=list(Path("/home/user/datasets/samples/").glob(r'**/*.jpg'))
count_num=0
for lnk in img_link:
binary_img=open(lnk,'rb')
find_img=tf.compat.as_bytes('JFIF') in binary_img.peek(10)#The JFIF is a JPEG File Interchange Format (JFIF). It is a standard which we gauge if an image is corrupt or substandard
if not find_img:
count_num+=1
os.remove(str(lnk))
print('Total %d pcs image delete from Dataset' % count_num)
#this should help you delete the bad encoded
Upvotes: 3
Reputation: 31
this should work fine, the same for supported types ... ex for png :
image = tf.io.read_file("im.png")
image = tf.image.decode_png(image, channels=3)
Upvotes: 1
Reputation: 11631
Some of your files in the validation folder are not in the format accepted by Tensorflow ( JPEG, PNG, GIF, BMP
), or may be corrupted. The extension of a file is indicative only, and does not enforce anything on the content of the file.
You might be able to find the culprit using the imghdr
module from the python standard library, and a simple loop.
from pathlib import Path
import imghdr
data_dir = "/home/user/datasets/samples/"
image_extensions = [".png", ".jpg"] # add there all your images file extensions
img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"]
for filepath in Path(data_dir).rglob("*"):
if filepath.suffix.lower() in image_extensions:
img_type = imghdr.what(filepath)
if img_type is None:
print(f"{filepath} is not an image")
elif img_type not in img_type_accepted_by_tf:
print(f"{filepath} is a {img_type}, not accepted by TensorFlow")
This should print out whether you have files that are not images, or that are not what their extension says they are, and not accepted by TF. Then you can either get rid of them or convert them to a format that TensorFlow supports.
Upvotes: 27