Why are the probabilities always the same with MobileNet V2 model (mobilenet_v2_1.4_224.tflite)?

Question

I am implementing a TensorFlow Lite model in my Android application using the mobilenet_v2_1.4_224.tflite model, which I downloaded from the TensorFlow GitHub repository: MobileNet from TensorFlow Models

The app works as follows:

I capture an image using the camera and save it as a temporary file.
The image is then resized to 224x224 pixels and normalized as per the preprocessing steps of MobileNet (subtracting 127.5 and dividing by 127.5).
Finally, the normalized image is converted to a ByteBuffer and passed to the model for inference.

While the model runs without any exceptions, the probabilities returned are always the same for all classes, regardless of the input image. For example, the probability for each class is consistently near zero or uniform, as if the model isn’t responding to the input.

Full Code Implementation:

MainActivity.kt

package com.example.myapplication

import android.Manifest
import android.content.pm.PackageManager
import android.graphics.Bitmap
import android.graphics.BitmapFactory
import android.net.Uri
import android.os.Bundle
import android.util.Log
import android.widget.Button
import android.widget.FrameLayout
import android.widget.Toast
import androidx.activity.ComponentActivity
import androidx.activity.result.contract.ActivityResultContracts
import androidx.core.content.ContextCompat
import androidx.core.content.FileProvider
import java.io.File
import java.io.InputStream
import java.nio.ByteBuffer
import java.text.SimpleDateFormat
import java.util.*

class MainActivity : ComponentActivity() {
    private lateinit var photoUri: Uri
    private lateinit var mobileNetClassifier: MobileNetClassifier

    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        mobileNetClassifier = MobileNetClassifier(this)

        // Layout programmatically
        val layout = FrameLayout(this).apply {
            id = FrameLayout.generateViewId()
        }
        setContentView(layout)

        val button = Button(this).apply {
            text = "Take a Photo"
            setOnClickListener { checkPermissionsAndOpenCamera() }
        }
        layout.addView(button)

        // Align button in the center
        val params = FrameLayout.LayoutParams(
            FrameLayout.LayoutParams.WRAP_CONTENT,
            FrameLayout.LayoutParams.WRAP_CONTENT
        ).apply {
            gravity = android.view.Gravity.CENTER
        }
        button.layoutParams = params
    }

    private fun analyzePhoto(photoUri: Uri) {
        val inputStream: InputStream? = contentResolver.openInputStream(photoUri)
        val bitmap = BitmapFactory.decodeStream(inputStream)
        inputStream?.close()

        // Convert the image to ByteBuffer
        val byteBuffer = convertBitmapToByteBuffer(bitmap)

        // Get the prediction
        val result = mobileNetClassifier.classifyImage(byteBuffer)

        // Display the result
        Toast.makeText(this, "Result: $result", Toast.LENGTH_LONG).show()
    }

    private fun convertBitmapToByteBuffer(bitmap: Bitmap): ByteBuffer {
        val IMAGE_MEAN = 127.5f
        val IMAGE_STD = 127.5f
        val IMAGE_SIZE_X = 224
        val IMAGE_SIZE_Y = 224
        val DIM_PIXEL_SIZE = 3
        val NUM_BYTES_PER_CHANNEL = 4 // Float size

        // Resize bitmap to match model input size
        val resizedBitmap = Bitmap.createScaledBitmap(bitmap, IMAGE_SIZE_X, IMAGE_SIZE_Y, false)

        val intValues = IntArray(IMAGE_SIZE_X * IMAGE_SIZE_Y)
        resizedBitmap.getPixels(intValues, 0, resizedBitmap.width, 0, 0, resizedBitmap.width, resizedBitmap.height)

        val byteBuffer = ByteBuffer.allocateDirect(
            IMAGE_SIZE_X * IMAGE_SIZE_Y * DIM_PIXEL_SIZE * NUM_BYTES_PER_CHANNEL
        )
        byteBuffer.order(ByteOrder.nativeOrder())
        byteBuffer.rewind()

        // Normalize pixel values
        for (pixel in intValues) {
            byteBuffer.putFloat(((pixel shr 16 and 0xFF) - IMAGE_MEAN) / IMAGE_STD) // Red
            byteBuffer.putFloat(((pixel shr 8 and 0xFF) - IMAGE_MEAN) / IMAGE_STD)  // Green
            byteBuffer.putFloat(((pixel and 0xFF) - IMAGE_MEAN) / IMAGE_STD)       // Blue
        }
        return byteBuffer
    }

    private fun checkPermissionsAndOpenCamera() {
        when {
            ContextCompat.checkSelfPermission(this, Manifest.permission.CAMERA) == PackageManager.PERMISSION_GRANTED -> {
                openCamera()
            }
            else -> {
                requestPermissionLauncher.launch(Manifest.permission.CAMERA)
            }
        }
    }

    private val requestPermissionLauncher = registerForActivityResult(
        ActivityResultContracts.RequestPermission()
    ) { isGranted: Boolean ->
        if (isGranted) {
            openCamera()
        }
    }

    private val takePictureLauncher = registerForActivityResult(
        ActivityResultContracts.TakePicture()
    ) { isSaved: Boolean ->
        if (isSaved) {
            analyzePhoto(photoUri)
        }
    }

    private fun openCamera() {
        val photoFile = createImageFile()
        photoUri = FileProvider.getUriForFile(
            this,
            "${packageName}.provider",
            photoFile
        )
        takePictureLauncher.launch(photoUri)
    }

    private fun createImageFile(): File {
        val timestamp = SimpleDateFormat("yyyyMMdd_HHmmss", Locale.US).format(Date())
        val storageDir = getExternalFilesDir(null)
        return File.createTempFile(
            "JPEG_${timestamp}_",
            ".jpg",
            storageDir
        )
    }
}

MobileNetClassifier.kt

package com.example.myapplication

import android.content.Context
import android.graphics.Bitmap
import android.util.Log
import org.tensorflow.lite.Interpreter
import org.tensorflow.lite.support.common.FileUtil
import java.io.InputStream
import java.nio.ByteBuffer
import java.nio.ByteOrder

class MobileNetClassifier(context: Context) {

    private val interpreter: Interpreter
    private val labels: List

    init {
        interpreter = loadModel(context, "mobilenet_v2_1.4_224.tflite")
        labels = loadLabels(context)
        Log.d("MobileNetClassifier", "Model and labels successfully loaded")
    }

    private fun loadModel(context: Context, modelFileName: String): Interpreter {
        return try {
            val model = FileUtil.loadMappedFile(context, modelFileName)
            Interpreter(model)
        } catch (e: Exception) {
            Log.e("MobileNetClassifier", "Error loading model file: $modelFileName", e)
            throw RuntimeException("Failed to load model", e)
        }
    }

    private fun loadLabels(context: Context): List {
        val labelsList = mutableListOf()
        try {
            val inputStream: InputStream = context.assets.open("labels.txt")
            inputStream.bufferedReader().useLines { lines ->
                lines.forEach { line ->
                    if (line.isNotBlank()) labelsList.add(line.trim())
                }
            }
        } catch (e: Exception) {
            Log.e("MobileNetClassifier", "Error loading labels", e)
            throw RuntimeException("Failed to load labels", e)
        }
        return labelsList
    }

    fun classifyImage(byteBuffer: ByteBuffer): String {
        val output = Array(1) { FloatArray(1001) }
        interpreter.run(byteBuffer, output)
        val probabilities = output[0]
        val maxIndex = probabilities.indices.maxByOrNull { probabilities[it] }
        return labels.getOrNull(maxIndex ?: -1) ?: "Unknown"
    }
}

Issue Details:

Despite following the recommended preprocessing and using a valid .tflite model, the output probabilities are identical for all inputs. Could this be an issue with the preprocessing or the model file itself?

What did I try?

Tested with multiple images: I captured different photos with distinct content (e.g., objects, landscapes, etc.), but the classification probabilities remain the same every time.

Validated the model loading process: I ensured that the mobilenet_v2_1.4_224.tflite model was correctly loaded using TensorFlow Lite's FileUtil.loadMappedFile method.

Verified input processing: I reviewed the convertBitmapToByteBuffer function to confirm that the pixel normalization (mean subtraction and division by standard deviation) was implemented correctly.

Logged the ByteBuffer values: I logged the first 10 values of the ByteBuffer input sent to the model to verify that they change between images. The logs show that the input buffer is indeed different for each image.

Checked the output probabilities: I inspected the output probabilities from the model after inference, and they are always the same, regardless of the input image.

What was I expecting?

I expected the probabilities to differ based on the image content. Since MobileNet is a pre-trained image classification model, it should produce varying outputs for distinct inputs, especially for such different photos.

Why are the probabilities always the same with MobileNet V2 model (mobilenet_v2_1.4_224.tflite)?

Answers (0)

Related Questions