Thomas Bergmueller
Thomas Bergmueller

Reputation: 221

OpenCV cvtColor() performance issue on iPhone 4(S)

I'm developing a cross-platform application in C++ at the moment, mostly targeted for Android and iOS. Overall it works pretty well and has incredible performance, but on iPhone 4 (S) it runs very very slow (see figures below).

The aim is to process ~5-10 fps of a video stream with a certain algorithm.

Beside others, the code was tested successfully (5 or more processed frames per second) and profiled on following devices:

However, as mentioned, it does not work on iPhone 4 and iPhone 4s. Both of them process 1 frame every two seconds => 0.5fps

Of course, this seems a bit strange since it is working on "weaker" devices like the Huawei and even Nexus One (2fps), so I started profiling with Instruments for performance and memory consumption.

Memory usage of the app

Memory consumption is ok, at most 16MB are used (as you can see from the image). However, the profiling of runtime left me a bit shocked.

Runtime Profiling

And inverse call tree:

Runtime Profiling with inverse call tree

Now, as you can see the CPU is busy with the cvtColor()-function (cv::RGB2RGB) for a huge share of total runtime. Internally the parallel_for implementation is used - could it be probably linked to the CPU not being suitable for running that code. Or is it just the cv::RGB2RGB function which is implemented somehow strange in OpenCV, because the BGR2Gray-conversion seems to run a lot faster?

I use the latest precompiled Version of OpenCV v2.4.9 for iOS. The piece of code in questions does basically nothing but color conversion from BGRA to Grayscale. It looks like:

Mat colorMat;
Mat gray;    

colorMat = Mat(vHeight,vWidth,CV_8UC4, rImageData); // no data is copied
cvtColor(colorMat,colorMat,CV_BGRA2BGR);
cvtColor(colorMat,gray,CV_BGR2GRAY);

Note its split up in two conversions, since further processing needs RGB and Gray information - that's why not in one conversion step.

Another side remark: I also tested the OpenCV for iOS samples (Chapter 12: Processing video), which delivered (when started with 30fps capturing rate):

My questions Since it is working very well on a wide range of devices and also on iOS devices, I conclude it has to be related to either Hard- or Software of the iPhone 4(s).

Has anybody a clue on what's possibly going wrong here? Has anybody experienced similar issues? I found very scarce information on the internet on people experiencing the same performance issues (i.e. here and here).

I'm aware of the fact that there a different video sizes, but two "simple" color conversions of an image with 1280x720 pixels is not supposed to consume around 2 seconds, especially not on a quite recent device as the iPhone 4 (S) are!

Any help, hints or experiences in this manner are highly appreciated!

Progress and further findings

Based on remi's comment I experimented with alternate solutions. Unfortunately I have to say that also the following (very trivial) thing does not work:

Mat colorMat, gray;
vector<Mat> channels;
AVDEBUG("starting", TAG,1);
colorMat = Mat(vHeight,vWidth,CV_8UC4, rImageData); // no data is copied
AVDEBUG("first", TAG, 1);
split(colorMat, channels);
AVDEBUG("intermediate " << colorMat.size(), TAG, 1);
// no BGRA2BGR conversion at all!!
gray = channels[0]; // take blue channel for gray
AVDEBUG("end", TAG, 1);

Produces the following output:

2014-07-24 09:07:41.763 CheckIfReal[604:3d03] AvCore-Debug: (Debug, Tag=CoreManager) Frame accepted (/Users/tbergmueller/Documents/dev/AvCore/avcore/CoreManager.cpp, line 591)

2014-07-24 09:07:41.765 CheckIfReal[604:3d03] AvCore-Debug: (Debug, Tag=CoreManager) starting (/Users/tbergmueller/Documents/dev/AvCore/avcore/CoreManager.cpp, line 636)

2014-07-24 09:07:41.771 CheckIfReal[604:3d03] AvCore-Debug: (Debug, Tag=CoreManager) first (/Users/tbergmueller/Documents/dev/AvCore/avcore/CoreManager.cpp, line 641)

2014-07-24 09:07:44.599 CheckIfReal[604:3d03] AvCore-Debug: (Debug, Tag=CoreManager) intermediate [720 x 1280] (/Users/tbergmueller/Documents/dev/AvCore/avcore/CoreManager.cpp, line 665)

2014-07-24 09:07:44.605 CheckIfReal[604:3d03] AvCore-Debug: (Debug, Tag=CoreManager) ending (/Users/tbergmueller/Documents/dev/AvCore/avcore/CoreManager.cpp, line 682)

Hence the Mat constructor Mat() is fast, because no data is copied (refer docs). However, the split() function takes in this code sample almost 3seconds!! Taking the blue channel as gray Mat is then fast again, since only a Mat-header is created.

This once again indicates that there is something wrong with the loop implementation, since split() copies data, which is obviously done in a loop.

Upvotes: 4

Views: 1713

Answers (1)

Thomas Bergmueller
Thomas Bergmueller

Reputation: 221

I'm going to resolve this one, thanks for the comments, which pushed me in the right direction!

As expected and also read from the comments, the mere fact of 1280x720px being too much data for processing on the iPhone 4s, I had to find a workaround.

As most of you might know, image processing is mostly done with gray-scale images.If images are captured as BGRA from the iPhone camera, this means first converting CV_BGRA2GRAY (which would be possible with cv::cvtColor).

Now, as seen from the profiling, this conversion takes too long, so I have to get rid of the conversion. One option, that's possible on the iPhone 4(s) is to configure the camera to Capture not in BGRA mode but in 420YpCbCr mode. There are StackOverflow-Topics around on how to configure the camera correctly. For me especially this and this as well was quite helpful.

Well, unfortunately the iPhone 4 only supports 3 pixel format types, namely 420v, 420f and BGRA. Using this information and the links above, I decided to go with kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange (which corresponds to 420v). The big benefit is then that you have the grayscale image (luma) in one image plane and the color information (chroma) in the other and can access them separately.

The key idea is then to detect regions of interest in the grayscale image and then apply the color space conversion only to those interesting pixels, which are generally much less than the complete image. By avoiding to actually convert to grayscale from a color image and only applying color space conversion to small regions of interest the processing speed rises up to ~10 frames per second for my algorithm on the iPhone 4, which is acceptable for the desired application.

Upvotes: 3

Related Questions