Reputation: 1
I am trying to import this dataset into tensor flow. The images are contained in a folder called 'Images' and are named by their indices. The labels to be used in the machine learning model are the counts of tomatoes in each image, which are obtained by counting objects in XML files (in folder 'Annotations'), which are named by the corresponding image indices.
How would one import a dataset like this into a tensor flow dataset? So far, I have been able to import the images without the labels, using this:
import tensorflow as tf
def load_and_preprocess_image(path):
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, [224, 224])
return image
dataset = tf.data.Dataset.list_files('TomatoPlantfactoryDataset/TomatoPlantfactoryDataset/Images')
dataset = dataset.map(load_and_preprocess_image).batch(32)
And I have been able to obtain the tomato counts from the XML annotations into a list/dictionary using this:
import xml.etree.ElementTree as ET
def parse_annotation_xml(file_path):
tree = ET.parse(file_path)
root = tree.getroot()
count = len(root.findall('object'))
return count
mypath = 'TomatoPlantfactoryDataset/TomatoPlantfactoryDataset/Annotations'
files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]
tomato_counts = []
for file in files:
tomato_counts.append([f"{file}"[64:68],parse_annotation_xml(file)])
But since the tensor flow objects are so obscure and I have no experience in manipulating them (they are not at all similar to DataFrames or numpy arrays), I have not been able to successfully assign the labels from my tomato_count
list to the images in the tensor flow dataset object.
My approach may be wrong entirely. Any help is appreciated. Thank you
Upvotes: 0
Views: 36