Reputation: 25934
I am trying to run inference on a jit traced model in C++ and currently the output I get in Python is different than the output I get in C++.
Initially I thought this be caused by the jit model itself, but now I don't think so, as I spotted some small deviations in the input tensor in the C++ code. I believe I did everything as instructed by the documentation so that might as well show an issue in torch::from_blob
. I'm not sure!
Therefore in order to make sure which is the case, here are the snippets both in Python and C++ plus the sample input to test it.
Here is the sample image:
For Pytorch run the following snippet of code:
import cv2
import torch
from PIL import Image
import math
import numpy as np
img = Image.open('D:/Codes/imgs/profile6.jpg')
width, height = img.size
scale = 0.6
sw, sh = math.ceil(width * scale), math.ceil(height * scale)
img = img.resize((sw, sh), Image.BILINEAR)
img = np.asarray(img, 'float32')
# preprocess it
img = img.transpose((2, 0, 1))
img = np.expand_dims(img, 0)
img = (img - 127.5) * 0.0078125
img = torch.from_numpy(img)
For C++:
#include <iostream>
#include <torch/torch.h>
#include <torch/script.h>
using namespace torch::indexing;
#include <opencv2/core.hpp>
#include<opencv2/imgproc/imgproc.hpp>
#include<opencv2/highgui/highgui.hpp>
void test15()
{
std::string pnet_path = "D:/Codes//MTCNN/pnet.jit";
cv::Mat img = cv::imread("D:/Codes/imgs/profile6.jpg");
int width = img.cols;
int height = img.rows;
float scale = 0.6f;
int sw = int(std::ceil(width * scale));
int sh = int(std::ceil(height * scale));
//cv::Mat img;
cv::resize(img, img, cv::Size(sw, sh), 0, 0, 1);
auto tensor_image = torch::from_blob(img.data, { img.rows, img.cols, img.channels() }, at::kByte);
tensor_image = tensor_image.permute({ 2,0,1 });
tensor_image.unsqueeze_(0);
tensor_image = tensor_image.toType(c10::kFloat).sub(127.5).mul(0.0078125);
tensor_image.to(c10::DeviceType::CPU);
}
### Input comparison :
and here are the tensor values both in Python and C++
Pytorch input (`img[:, :, :10, :10]`):
```python
img: tensor([[
[[0.3555, 0.3555, 0.3477, 0.3555, 0.3711, 0.3945, 0.3945, 0.3867, 0.3789, 0.3789],
[ 0.3477, 0.3555, 0.3555, 0.3555, 0.3555, 0.3555, 0.3555, 0.3477, 0.3398, 0.3398],
[ 0.3320, 0.3242, 0.3320, 0.3242, 0.3320, 0.3398, 0.3398, 0.3242, 0.3164, 0.3242],
[ 0.2852, 0.2930, 0.2852, 0.2852, 0.2930, 0.2930, 0.2930, 0.2852, 0.2773, 0.2773],
[ 0.2539, 0.2617, 0.2539, 0.2617, 0.2539, 0.2148, 0.2148, 0.2148, 0.2070, 0.2070],
[ 0.1914, 0.1914, 0.1836, 0.1836, 0.1758, 0.1523, 0.1367, 0.1211, 0.0977, 0.0898],
[ 0.1367, 0.1211, 0.0977, 0.0820, 0.0742, 0.0586, 0.0273, -0.0195, -0.0742, -0.0820],
[-0.0039, -0.0273, -0.0508, -0.0664, -0.0898, -0.1211, -0.1367, -0.1523, -0.1758, -0.1758],
[-0.2070, -0.2070, -0.2148, -0.2227, -0.2148, -0.1992, -0.1992, -0.1836, -0.1680, -0.1680],
[-0.2539, -0.2461, -0.2383, -0.2305, -0.2227, -0.1914, -0.1836, -0.1758, -0.1680, -0.1602]],
[[0.8398, 0.8398, 0.8320, 0.8242, 0.8320, 0.8477, 0.8398, 0.8320, 0.8164, 0.8164],
[ 0.8320, 0.8242, 0.8164, 0.8164, 0.8086, 0.8008, 0.7930, 0.7852, 0.7695, 0.7695],
[ 0.7852, 0.7852, 0.7773, 0.7695, 0.7695, 0.7617, 0.7539, 0.7383, 0.7305, 0.7148],
[ 0.7227, 0.7070, 0.7070, 0.6992, 0.6914, 0.6836, 0.6836, 0.6680, 0.6523, 0.6367],
[ 0.6289, 0.6211, 0.6211, 0.6211, 0.6055, 0.5586, 0.5508, 0.5352, 0.5273, 0.5039],
[ 0.4805, 0.4727, 0.4648, 0.4648, 0.4570, 0.4180, 0.3945, 0.3633, 0.3477, 0.3164],
[ 0.3555, 0.3398, 0.3086, 0.2930, 0.2695, 0.2461, 0.2070, 0.1523, 0.1055, 0.0820],
[ 0.1367, 0.1133, 0.0820, 0.0508, 0.0273, -0.0117, -0.0352, -0.0508, -0.0820, -0.0898],
[-0.1211, -0.1289, -0.1445, -0.1602, -0.1602, -0.1523, -0.1523, -0.1367, -0.1367, -0.1289],
[-0.2070, -0.1992, -0.1992, -0.1992, -0.1992, -0.1680, -0.1680, -0.1602, -0.1523, -0.1445]],
[[0.9492, 0.9414, 0.9336, 0.9180, 0.9180, 0.9336, 0.9258, 0.9023, 0.8867, 0.9023],
[ 0.9258, 0.9258, 0.9102, 0.9023, 0.8945, 0.8789, 0.8633, 0.8477, 0.8320, 0.8398],
[ 0.8711, 0.8633, 0.8555, 0.8477, 0.8320, 0.8242, 0.8086, 0.7930, 0.7852, 0.7773],
[ 0.7852, 0.7773, 0.7617, 0.7539, 0.7461, 0.7305, 0.7148, 0.6992, 0.6914, 0.6836],
[ 0.6758, 0.6680, 0.6602, 0.6602, 0.6367, 0.5820, 0.5742, 0.5508, 0.5430, 0.5273],
[ 0.5117, 0.5117, 0.4961, 0.4883, 0.4727, 0.4336, 0.4102, 0.3711, 0.3477, 0.3242],
[ 0.3867, 0.3711, 0.3398, 0.3164, 0.2930, 0.2539, 0.2148, 0.1523, 0.1055, 0.0820],
[ 0.1680, 0.1445, 0.1055, 0.0742, 0.0352, -0.0039, -0.0273, -0.0586, -0.0820, -0.0898],
[-0.0898, -0.0977, -0.1211, -0.1367, -0.1445, -0.1445, -0.1445, -0.1445, -0.1445, -0.1445],
[-0.1758, -0.1680, -0.1680, -0.1680, -0.1680, -0.1523, -0.1523, -0.1602, -0.1602, -0.1523]]]])
C++/Libtorch tensor values (img.index({Slice(), Slice(), Slice(None, 10), Slice(None, 10)});
):
img: (1,1,.,.) =
0.3555 0.3555 0.3555 0.3555 0.3555 0.4023 0.3945 0.3867 0.3789 0.3789
0.3633 0.3633 0.3555 0.3555 0.3555 0.3555 0.3477 0.3555 0.3398 0.3398
0.3398 0.3320 0.3320 0.3242 0.3398 0.3320 0.3398 0.3242 0.3242 0.3242
0.2930 0.2930 0.2852 0.2773 0.2852 0.2930 0.2852 0.2852 0.2773 0.2852
0.2695 0.2695 0.2617 0.2773 0.2695 0.2227 0.2227 0.2227 0.2148 0.2148
0.1914 0.1914 0.1914 0.1914 0.1914 0.1602 0.1445 0.1289 0.1055 0.0977
0.1289 0.1133 0.0820 0.0742 0.0586 0.0586 0.0195 -0.0273 -0.0820 -0.0898
0.0039 -0.0195 -0.0508 -0.0664 -0.0820 -0.1289 -0.1445 -0.1602 -0.1836 -0.1836
-0.2070 -0.2148 -0.2227 -0.2383 -0.2305 -0.2070 -0.2070 -0.1914 -0.1836 -0.1758
-0.2539 -0.2461 -0.2461 -0.2383 -0.2305 -0.1914 -0.1914 -0.1758 -0.1680 -0.1602
(1,2,.,.) =
0.8398 0.8398 0.8242 0.8164 0.8242 0.8555 0.8398 0.8320 0.8242 0.8242
0.8320 0.8320 0.8242 0.8242 0.8086 0.8008 0.7930 0.7773 0.7695 0.7617
0.7930 0.7852 0.7773 0.7695 0.7695 0.7695 0.7539 0.7461 0.7305 0.7227
0.7070 0.7070 0.6992 0.6992 0.6914 0.6836 0.6758 0.6602 0.6523 0.6367
0.6367 0.6367 0.6289 0.6289 0.6211 0.5664 0.5586 0.5430 0.5352 0.5117
0.4805 0.4805 0.4805 0.4648 0.4727 0.4258 0.4023 0.3711 0.3555 0.3320
0.3398 0.3320 0.3008 0.2773 0.2617 0.2461 0.1992 0.1445 0.0898 0.0586
0.1367 0.1211 0.0898 0.0508 0.0273 -0.0195 -0.0352 -0.0664 -0.0898 -0.1055
-0.1211 -0.1289 -0.1367 -0.1602 -0.1602 -0.1523 -0.1523 -0.1445 -0.1445 -0.1367
-0.2148 -0.2070 -0.2070 -0.2070 -0.1992 -0.1680 -0.1680 -0.1602 -0.1523 -0.1445
(1,3,.,.) =
0.9414 0.9414 0.9336 0.9180 0.9102 0.9336 0.9258 0.9023 0.8945 0.9023
0.9180 0.9180 0.9102 0.9102 0.8945 0.8711 0.8633 0.8555 0.8242 0.8477
0.8711 0.8711 0.8633 0.8477 0.8320 0.8164 0.8164 0.7930 0.7852 0.7852
0.7773 0.7773 0.7539 0.7461 0.7305 0.7148 0.7070 0.6992 0.6836 0.6758
0.6836 0.6836 0.6758 0.6680 0.6445 0.5898 0.5820 0.5586 0.5508 0.5352
0.5273 0.5195 0.5117 0.4883 0.4883 0.4414 0.4102 0.3789 0.3633 0.3398
0.3867 0.3633 0.3320 0.3008 0.2695 0.2539 0.2070 0.1445 0.0898 0.0664
0.1836 0.1523 0.1133 0.0742 0.0352 -0.0117 -0.0352 -0.0664 -0.0898 -0.1055
-0.0820 -0.0977 -0.1211 -0.1367 -0.1445 -0.1445 -0.1445 -0.1367 -0.1445 -0.1445
-0.1758 -0.1758 -0.1758 -0.1758 -0.1758 -0.1602 -0.1523 -0.1680 -0.1602 -0.1602
[ CPUFloatType{1,3,10,10} ]
By the way, these are the tensor values before being normalized/preprocessed:
Python:
img.shape: (3, 101, 180)
img: [
[[173. 173. 172. 173. 175.]
[172. 173. 173. 173. 173.]
[170. 169. 170. 169. 170.]
[164. 165. 164. 164. 165.]
[160. 161. 160. 161. 160.]]
[[235. 235. 234. 233. 234.]
[234. 233. 232. 232. 231.]
[228. 228. 227. 226. 226.]
[220. 218. 218. 217. 216.]
[208. 207. 207. 207. 205.]]
[[249. 248. 247. 245. 245.]
[246. 246. 244. 243. 242.]
[239. 238. 237. 236. 234.]
[228. 227. 225. 224. 223.]
[214. 213. 212. 212. 209.]]]
CPP:
img.shape: [1, 3, 101, 180]
img: (1,1,.,.) =
173 173 173 173 173
174 174 173 173 173
171 170 170 169 171
165 165 164 163 164
162 162 161 163 162
(1,2,.,.) =
235 235 233 232 233
234 234 233 233 231
229 228 227 226 226
218 218 217 217 216
209 209 208 208 207
(1,3,.,.) =
248 248 247 245 244
245 245 244 244 242
239 239 238 236 234
227 227 224 223 221
215 215 214 213 210
[ CPUByteType{1,3,5,5} ]
As you can see, at first glance, they might look identical, but upon looking closer, you can see many small deviations in the input! How can I avoid these changes, and get to the exact values in C++?
I wonder what is causing this weird phenomena to happen!
Upvotes: 6
Views: 8953
Reputation: 25934
Its being made clear that this is indeed an input issue and more specifically this is because the image is first read by PIL.Image.open
in Python and later changed into a numpy
array. If the image is read with OpenCV
, then, everything input-wise, is the same both in Python and C++.
However, in my specific case, using the OpenCV image results in a minor change in the final result. The only way this change/difference is minimized, is when I make the Opencv image grayscale and feed it to the network in which case, both the PIL input and opencv input have nearly identical output.
Here are the two example, the pil image is bgr and the opencv is in grayscale mode: you need to save them on disk and see that the are nearly identical (left is cv_image, right is pil_image):
However, if I simply don't convert the opencv image into grayscale mode (and back to bgr to get 3 channels), this is how it looks (left is cv_image and right is pil_image):
This turned out to be again input related. the reason we had slight differences was due to the model being trained on rgb images and thus channels order mattered. When using PIL image, there were some conversions happening back and forth for different methods and thus it caused the whole thing to be a mess that you earlier read about above.
To cut a long story short, there was not any issue regarding the conversion from cv::Mat
into a torch::Tensor
or vice versa, the issue was in the way the images were created and fed to the network differently in Python and C++. When both Python and C++ backend, used OpenCV for dealing with images, their output and result matched 100%.
Upvotes: 4