Portia
Portia

Reputation: 1

How to load image dataset with XML annotations (containing the labels) into tensor flow dataset?

I am trying to import this dataset into tensor flow. The images are contained in a folder called 'Images' and are named by their indices. The labels to be used in the machine learning model are the counts of tomatoes in each image, which are obtained by counting objects in XML files (in folder 'Annotations'), which are named by the corresponding image indices.

How would one import a dataset like this into a tensor flow dataset? So far, I have been able to import the images without the labels, using this:

import tensorflow as tf

def load_and_preprocess_image(path):
  image = tf.io.read_file(path)
  image = tf.image.decode_jpeg(image, channels=3)
  image = tf.image.resize(image, [224, 224])
  return image

dataset = tf.data.Dataset.list_files('TomatoPlantfactoryDataset/TomatoPlantfactoryDataset/Images')
dataset = dataset.map(load_and_preprocess_image).batch(32)

And I have been able to obtain the tomato counts from the XML annotations into a list/dictionary using this:

import xml.etree.ElementTree as ET

def parse_annotation_xml(file_path):
  tree = ET.parse(file_path)
  root = tree.getroot()
  count = len(root.findall('object'))
  return count

mypath = 'TomatoPlantfactoryDataset/TomatoPlantfactoryDataset/Annotations'
files = [path.join(mypath, f) for f in listdir(mypath) if f.endswith('.xml')]

tomato_counts = []

for file in files:
    tomato_counts.append([f"{file}"[64:68],parse_annotation_xml(file)])

But since the tensor flow objects are so obscure and I have no experience in manipulating them (they are not at all similar to DataFrames or numpy arrays), I have not been able to successfully assign the labels from my tomato_count list to the images in the tensor flow dataset object.

My approach may be wrong entirely. Any help is appreciated. Thank you

Upvotes: 0

Views: 36

Answers (0)

Related Questions