Reputation: 21
My objective is to develop a object detection model for iOS and Android. I have trained an SSD Mobilenetv3 model with Pytorch framework using Coco format dataset. I have trained it from the official Pytorch documentation. Pytorch model is working fine and we could make inference from it as well.
Now I have to convert the Pytorch model to Coreml. I have followed Apple documentation and was able to convert the model successfully. On loading the model in xcode, it gives pixel buffer issues. A valid input shape is being sent to the model for prediction.
The code used for Pytorch to CoreML conversion:
import torch
import torchvision
import numpy as np
import coremltools as ct
model_path = '<your-model-path>/checkpoint.pth' # Replace with your model file path
model = torchvision.models.detection.ssdlite320_mobilenet_v3_large(weights_backbone=None)
state_dict = torch.load(model_path, map_location="cpu", weights_only=False)["model"]
model.load_state_dict(state_dict)
model.eval()
# Wrapper for compatibility with torch.jit.trace
class SSDWrapper(torch.nn.Module):
def __init__(self, model):
super(SSDWrapper, self).__init__()
self.model = model
def forward(self, x):
# Extract only the required tensor output (e.g., bounding boxes and scores)
outputs = self.model(x)
boxes = outputs[0]["boxes"] # Bounding boxes
scores = outputs[0]["scores"] # Scores
labels = outputs[0]["labels"] # Class labels
max_scores = torch.maximum(scores, torch.tensor(0.0, device=scores.device))
return boxes, max_scores, labels
# Wrap the model
wrapped_model = SSDWrapper(model)
# Create a dummy input to trace the model
dummy_input = torch.rand(size=(1, 3, 320, 320)) # Adjust size based on your dataset
# Trace the model using torch.jit.trace
traced_model = torch.jit.trace(wrapped_model, dummy_input, strict=False)
traced_model.eval()
# Define preprocessing parameters
mean = np.array((0.485, 0.456, 0.406))
std = np.array((0.229, 0.224, 0.225))
scale = 1.0 / (0.226 * 255.0)
# Specify input as an image type for Core ML
image_input = ct.ImageType(
name="input",
shape=(1, 3, 320, 320),
scale=scale,
bias=-mean / std
)
# Convert the traced model to Core ML
mlmodel = ct.convert(
traced_model,
inputs=[image_input],
minimum_deployment_target=ct.target.iOS16,
)
# Save the Core ML model
mlmodel.save("ssd_mobilenetv3.mlpackage")
print("Core ML model conversion complete.")
The model produced was as expected - .mlpackage. It is also loading up in xcode. When we want to make inference on it, it gives the following error -
Cannot create CVPixelBufferPool with kCVPixelBufferHeightKey value (0) <= 0.
Faield to create a CVPixelBufferPool for frame size 4 x 0 with pixel format type L00h because CVPixelBufferPoolCreate returned -6682.
Cannot create CVPixelBufferPool with kCVPixelBufferWidthKey value (0) <= 0.
Faield to create a CVPixelBufferPool for frame size 0 x 1 with pixel format type L00h because CVPixelBufferPoolCreate returned -6682.
We have checked at multiple break points, the image shape was retained till the prediction function was called.
public func predict(image: CVPixelBuffer) -> [Prediction]? {
let imageInput = ssd_mobilenet_v3_iOS16Input(input: image)
if let output = try? model?.prediction(input: imageInput) {
print(output, "OUTPUT")
Please help me understand if Coreml supports object detection model other than YOLO. The Coreml documentation only mentions conversion for classification and segmentation models as of today. Please mention if I should follow some processing steps before or after the conversion.
Upvotes: 0
Views: 22