tms40
tms40

Reputation: 1

Tensorflow Lite LSTM Model for Microprocessors on ESP32, "Type INT32 (2) not supported"

The Current error, seen when running the model on the ESP32 (An Adafruit ESP32 Huzzah):

Type INT32 (2) not supported.
Node ADD (number 4) failed to invoke with status 1
Node WHILE (number 2) failed to invoke with status 1

Preamble

Currently developing a system to predict future movement in a person using an ESP32 connected to several sensors. The system's intent is to use 20 samples taken over the previous 20 cycles from 8 sensors to create a 20x8 Snapshot that is fed into a Tensorflow Lite LSTM Model. This outputs a 1x10 array containing what it thinks will be the position of the wearer's leg for the next 10 cycles. I Have had previous experience with a stateful Convolutional Neural Network, using EloquentTinyML to simplify the process. That worked fine, however that does not seem to work with LSTM models.

The Tensorflow Model:

model = tf.keras.Sequential([
    Input(shape=(20,8), name="Input"),
    LSTM(units=48, return_sequences=True, activation='relu', unroll=False),
    LSTM(units=16, return_sequences=False, activation='relu',unroll=False),
    layers.Flatten(),
    Dense(units=pred, name="output")
    ])

By default this model takes in float32 values normalised between -1 and 1.

The Process of conversion from generating the model in Python using Keras and Tensorflow to TFLite:

def representative_data_gen():
    samples = int(data_train.shape[0]/10) #generate 1/10th of data_train as samples
    rSamp = rd.sample(range(data_train.shape[0]), samples) #randomly choose array values within data_train
    i = 0
    for i in rSamp: #for each chosen array value within data_train
        yield[data_train[i].astype(np.float32)]#yield data at that array value

run_model = tf.function(lambda x: model(x))
BATCH_SIZE = 1 
STEPS = 20 #fails if not 20
INPUT_SIZE = 8 #fails if not 8
concrete_func = run_model.get_concrete_function(tf.TensorSpec([BATCH_SIZE, STEPS, INPUT_SIZE], model.inputs[0].dtype))

# model directory.
MODEL_DIR = "keras_lstm"
model.save(MODEL_DIR, save_format="tf", signatures=concrete_func)

print("saved")
converter = tf.lite.TFLiteConverter.from_saved_model(MODEL_DIR)
#converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen #required to quantise to int8
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  
converter.inference_output_type = tf.int8  
tflite_model = converter.convert()

open(fileLoc+fileName, "wb").write(tflite_model)

This code saves the model as in SavedModel.pb format, then loads it and converts it to .tflite format, the use of "converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]" and setting inference_input_type = tf.int8 was an attempt to force the model to accept int8 rather than float32, it doesn't seem to work. The code is not entirely my own work, taking inspiration from Here and Here (

the .tflite model is then converted to a header file using vim and xxd, EG:

xxd -i modelLSTM4816_10.tflite > exoLSTM4816_10q.h

In order to implement this model into the ESP 32 I am using the tflite-micro ESP Examples Library. This code is also used to draw inspiration from.

All Relevant Code for the Model and ESP32 can be found Here

[ESP Code contains all libraries, relevant code is within /src] Data is fed into the model as a 160-length array.

The Error

The Current Error occurs when using interpreter->Invoke(). Giving the error:

Type INT32 (2) not supported.
Node ADD (number 4) failed to invoke with status 1
Node WHILE (number 2) failed to invoke with status 1

My assumption has been this error relates to the model not accepting float32 data, which led to the model being Quantised to int8 format. I Have confirmed the model input and output are in int8 format via Retron, and the data fed into it is of int8_t format. Yet the error remains.

My second assumption is that it relates in some what to Node ADD and Node WHILE, a post with a very similar issue Here has errors involving "STRIDED SLICE" instead of "ADD" and "WHILE", these seem to be Built-In Operators called by the AllOpsResolver. add.cc seems to have conditions for different input types (int8, float32, etc), it seems to not recognise my model being in int8 format, having a case for "kTfLiteInt8", which is equal to 9 to represent int8, despite this checking my model's input and output types with model_input->type and model_output->type a 9 is produced in both instances.

Despite the errors, I do get outputs, however the 10 predictions I get are just the first 10 input values.

I've spent about 2-3 weeks on trying to convert an LSTM model to work on an ESP32 and have run out of patience and ideas. Any Assistance would be appreciated.

A Dump of my ESP32 Code for quick reference:

#include <Arduino.h>
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/system_setup.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "tensorflow/lite/micro/tflite_bridge/micro_error_reporter.h"

//network C files, only include one at a time
//#include "exoConv3224_10.h" //Conv1D network
//#include "exoLSTM3216_10_Rolled.h" //unrolled, default LSTM option
//include "exoLSTM3216_10_UnRoll.h" //rolled, non default LSTM option
//#include "exoLSTM48_10.h" //rolled, simple LSTM
#include "exoLSTM4816_10q.h" //quantised simple LSTM
#include "exoMoveIdeal.h"

namespace {
    const tflite::Model* model = nullptr;
    tflite::MicroInterpreter* interpreter = nullptr;
    TfLiteTensor* model_input = nullptr;
    TfLiteTensor* model_output = nullptr;
    //FeatureProvider* feature_provider = nullptr;
    //RecognizeCommands* recognizer = nullptr;
    int32_t previous_time = 0;

    // Create an area of memory to use for input, output, and intermediate arrays.
    // The size of this will depend on the model you're using, and may need to be
    // determined by experimentation.
    constexpr int kFeatureSliceSize = 20; //number of samples per sensor
    constexpr int kFeatureSliceCount = 8; //number of sensors
    constexpr int kFeatureElementCount = (kFeatureSliceSize * kFeatureSliceCount); //total number of elements

    constexpr int kTensorArenaSize = 80 * 1024; //vary this with model size
    uint8_t tensor_arena[kTensorArenaSize];
    float feature_buffer[kFeatureElementCount]; //store features in buffer
    int8_t* model_input_buffer = nullptr;
}  // namespace

void setup()
{
    Serial.begin(115200);
    Serial.println("Loading Tensorflow model....");
    model = tflite::GetModel(model_data); //get the model
    Serial.println("Model Loaded");

    tflite::AllOpsResolver resolver; //could use limited Micro_ops_resolver and define specific future bits
    static tflite::MicroInterpreter static_interpreter(model, resolver, tensor_arena, kTensorArenaSize);
    Serial.println("Resolver");
    interpreter = &static_interpreter;
    TfLiteStatus allocate_status = interpreter->AllocateTensors();
    Serial.println("Allocated Tensors");
    if (allocate_status != kTfLiteOk) 
    {
        Serial.println("AllocateTensors() failed");
        Serial.println(allocate_status); //print status
        return;
    }
    
    model_input = interpreter->input(0);
    model_output = interpreter->output(0); //get outputs
    //model_input->dims->data[0] = 8;
    //model_input->dims->data[1] = 160;
    //model_input->dims->size = 2;
    Serial.println(model_input->dims->size); //output 3, should be 2?
    Serial.println(model_input->dims->data[0]); //output 1, correct?
    Serial.println(model_input->dims->data[1]); //20
    Serial.println(model_input->dims->data[2]); //8
    //Serial.println(model_input->type); //type, int8 outputs a 9
    Serial.println("");
    Serial.println("Create Buffer");
    //model_input_buffer = model_input->data.int8; //give model_input_buffer an address where data will be placed
    Serial.printf("%p\n",(void *)model_input_buffer);

    //delay(1000);
    Serial.println("Fill Buffer");
    int i = 0;

    for(i = 0; i < 160; i++) //add from test array to buffer, should be 160
    { //ideally input data should be normalised between -1 and 1, not sure how that would be compatible with int8? Replace -1 to 1 with -128 to 127?
        if(i%8==0) //Seperate out each sample on print
            Serial.print("| ");
        //model_input_buffer[i] = mTestq2[i]; //160-length list of int8 values
        model_input->data.int8[i] = mTestq2[i];  //160-length list of int8 values
        Serial.printf("%d ",model_input->data.int8[i]);

    }

    Serial.println("\nInvoke");
    interpreter->Invoke(); //do network stuff, current fails from invalid INT32 type
    Serial.println("Get Output");

    model_output = interpreter->output(0); //get outputs
    Serial.println(model_output->type); //type, int8 outputs a 9
    Serial.println(model_output->dims->size);//print output pointer data
    Serial.println(model_output->dims->data[0]); //1x10 output, as it should be
    Serial.println(model_output->dims->data[1]);
    Serial.println("Print Predictions");
    //Serial.printf("%p\n",(void *)model_output);
    for(i = 0; i < 10; i++) //output 10 predictions, currently just outputs first 10 inputs as no processing done to it
    {
        Serial.printf("%d, ", model_output->data.int8[i]);
    }
    

}

void loop() //repeats basic loop of recognition, still fails
{
    int i = 0;
    delay(1000);
    for(i = 0; i < 160; i++) //add from test array to buffer, should be 160
    { //ideally input data should be normalised between -1 and 1, not sure how that would be compatible with int8? Replace -1 to 1 with -128 to 127?
        //model_input_buffer[i] = mTestq2[i]; //160-length list of int8 values
        model_input->data.int8[i] = mTestq2[i];  //160-length list of int8 values
    }
    interpreter->Invoke(); //do network stuff, current fails from invalid INT32 type
    model_output = interpreter->output(0); //get outputs
    Serial.println("Print Predictions");
    //Serial.printf("%p\n",(void *)model_output);
    for(i = 0; i < 10; i++) //output 10 predictions, currently just outputs first 10 inputs as no processing done to it
    {
        Serial.printf("%d, ", model_output->data.int8[i]);
    }
}

Upvotes: 0

Views: 899

Answers (1)

tms40
tms40

Reputation: 1

This Issues has a potential solution: see link. Essentially, a patch must be applied to the add.cc file, currently TF-Micro library and by extension the TF-Micro-ESP-Examples library's ADD OP does not properly support INT32. It's possible this might change in the future.

Upvotes: 0

Related Questions