Ria Ghosh
Ria Ghosh

Reputation: 21

Issue with Pytorch -> CoreML convertion for SSD Mobilenetv3 Object Detection Model

My objective is to develop a object detection model for iOS and Android. I have trained an SSD Mobilenetv3 model with Pytorch framework using Coco format dataset. I have trained it from the official Pytorch documentation. Pytorch model is working fine and we could make inference from it as well.

Now I have to convert the Pytorch model to Coreml. I have followed Apple documentation and was able to convert the model successfully. On loading the model in xcode, it gives pixel buffer issues. A valid input shape is being sent to the model for prediction.

The code used for Pytorch to CoreML conversion:

import torch
import torchvision
import numpy as np
import coremltools as ct

model_path = '<your-model-path>/checkpoint.pth'  # Replace with your model file path
model = torchvision.models.detection.ssdlite320_mobilenet_v3_large(weights_backbone=None)
state_dict = torch.load(model_path, map_location="cpu", weights_only=False)["model"]
model.load_state_dict(state_dict)
model.eval()

# Wrapper for compatibility with torch.jit.trace
class SSDWrapper(torch.nn.Module):
    def __init__(self, model):
        super(SSDWrapper, self).__init__()
        self.model = model

    def forward(self, x):
        # Extract only the required tensor output (e.g., bounding boxes and scores)
        outputs = self.model(x)
        boxes = outputs[0]["boxes"]  # Bounding boxes
        scores = outputs[0]["scores"]  # Scores
        labels = outputs[0]["labels"]  # Class labels

        max_scores = torch.maximum(scores, torch.tensor(0.0, device=scores.device))
        return boxes, max_scores, labels

# Wrap the model
wrapped_model = SSDWrapper(model)

# Create a dummy input to trace the model
dummy_input = torch.rand(size=(1, 3, 320, 320))  # Adjust size based on your dataset

# Trace the model using torch.jit.trace
traced_model = torch.jit.trace(wrapped_model, dummy_input, strict=False)
traced_model.eval()

# Define preprocessing parameters
mean = np.array((0.485, 0.456, 0.406))
std = np.array((0.229, 0.224, 0.225))
scale = 1.0 / (0.226 * 255.0)

# Specify input as an image type for Core ML
image_input = ct.ImageType(
    name="input",
    shape=(1, 3, 320, 320),
    scale=scale,
    bias=-mean / std
)

# Convert the traced model to Core ML
mlmodel = ct.convert(
    traced_model,
    inputs=[image_input],
    minimum_deployment_target=ct.target.iOS16,
)

# Save the Core ML model
mlmodel.save("ssd_mobilenetv3.mlpackage")
print("Core ML model conversion complete.")

The model produced was as expected - .mlpackage. It is also loading up in xcode. When we want to make inference on it, it gives the following error -

Cannot create CVPixelBufferPool with kCVPixelBufferHeightKey value (0) <= 0.
Faield to create a CVPixelBufferPool for frame size 4 x 0 with pixel format type L00h because CVPixelBufferPoolCreate returned -6682.
Cannot create CVPixelBufferPool with kCVPixelBufferWidthKey value (0) <= 0.
Faield to create a CVPixelBufferPool for frame size 0 x 1 with pixel format type L00h because CVPixelBufferPoolCreate returned -6682.

We have checked at multiple break points, the image shape was retained till the prediction function was called.

public func predict(image: CVPixelBuffer) -> [Prediction]? {
        let imageInput = ssd_mobilenet_v3_iOS16Input(input: image)
   if let output = try? model?.prediction(input: imageInput) {
            print(output, "OUTPUT")

Please help me understand if Coreml supports object detection model other than YOLO. The Coreml documentation only mentions conversion for classification and segmentation models as of today. Please mention if I should follow some processing steps before or after the conversion.

Upvotes: 0

Views: 22

Answers (0)

Related Questions