Converting input Texture2D to a 1x32x32x3 Tensor for input into an ONNX model in Unity

Question

I have created a script in Unity that takes in a Texture2D and outputs a class prediction using an ONNX model that I have trained on the Cifar10 dataset. I have had success using a similar setup for the MNIST dataset so I think the issue is within converting the Texture2D to a Tensor as that is the only difference between the two scripts.

using System.Collections;
using System.Collections.Generic;
using System.Linq;
using UnityEngine;
using Unity.Barracuda;
using UI = UnityEngine.UI;
using TMPro;

public class Cifar10Script : MonoBehaviour
{
    public NNModel onnxAsset;
    public Texture2D imageToRecognise;
    public Texture2D tempImage;

    public UI.RawImage _imageView;
    public UI.Text _textView;

    public TMP_Text m_predictionTextDrawing;
    public TMP_Text m_predictionTextCamera;

    private int m_classPrediction;

    public void RunModel()
    {
        using Tensor input = new Tensor(1, 32,  32, 3);

        // Convert the input Texture2D into a 1x32x32x3 tensor.
        for (int y = 0; y < 32; y++)
        {
            for (int x = 0; x < 32; x++)
            {
                int tx = x * tempImage.width / 32;
                int ty = y * tempImage.height / 32;
                input[0, 31 - y, x, 0] = tempImage.GetPixel(tx, ty).r;
                input[0, 31 - y, x, 1] = tempImage.GetPixel(tx, ty).g;
                input[0, 31 - y, x, 2] = tempImage.GetPixel(tx, ty).b;
            }
        }

        IWorker worker = ModelLoader.Load(onnxAsset).CreateWorker(WorkerFactory.Device.CPU);

        worker.Execute(input);

        Tensor output = worker.PeekOutput();

        float[] scores = Enumerable.Range(0, 10).Select(i => output[i]).ToArray();

        float[] outputBuffer = output.ToReadOnlyArray();

        float lowValue = -999;
        int index = -1;

        for(int i = 0; i < 10; i++)
        {
            print(i + " " + outputBuffer[i]);
            if (outputBuffer[i] > lowValue)
            {
                lowValue = outputBuffer[i];
                index = i;
            }
        }

        worker.Dispose();

        if (outputBuffer[index] < 0.6f)
        {
            //m_predictionTextDrawing.SetText("?");
            //m_predictionTextCamera.SetText("?");
            m_classPrediction = -1;
            //return -1;
        }
        else
        {
            m_classPrediction = index;
            //m_predictionTextDrawing.SetText(index.ToString());
            //m_predictionTextCamera.SetText(index.ToString());
        }
        m_classPrediction = index;
    }

    public int GetClassPrediction()
    {
        return m_classPrediction;
    }

}

The issue I am having is that the prediction scores remain very similar regardless of what image I pass it as a Texture2D. It seems to always have a 40% certainty that the image is one of a cat.

I have tried numerous trained models with different architectures, and I have tried switching out the images as well. I have tried constructing the Tensor by passing it a Texture as well which also did not work. I have narrowed down the issue to the point where I am fairly certain it is the converting from the Texture2D to a Tensor that is the issue.

Converting input Texture2D to a 1x32x32x3 Tensor for input into an ONNX model in Unity

Answers (0)

Related Questions