Tae-Sung Shin
Tae-Sung Shin

Reputation: 20643

Faster method of accessing a channel from RGB image in OpenCV?

In my trials with images of 1409x900 and 960x696, it takes 2.5 ms on average to split channels of a RGB image using OpenCV in my 64-bit 6-core 3.2 GHz Windows machine.

vector<cv::Mat> channels;
cv::split(img, channels);

I found that this is almost similar amount of time for the other image processing (boolean operation + morphological opening).

Considering my code only uses an image of a channel from the splitting, I wonder if there is any faster way of extracting single channel from a RGB image, preferably with OpenCV.

UPDATE

As @DanMašek pointed out, there was another function mixChannels that can extract a single channel image from multi-channel. I've tested about 2000 images with the same sizes. mixChannels took about 1 ms on average. For now, I am satisfied with the result. But post your answer if you can make it faster.

cv::Mat channel(img.rows, img.cols, CV_8UC1);
int from_to[] = { sel_channel,0 };
mixChannels(&img, 1, &channel, 1, from_to, 1);

Upvotes: 2

Views: 2407

Answers (1)

Dan Mašek
Dan Mašek

Reputation: 19041

Two simple options come to mind here.

  1. You mention that you perform this operation repeatedly on images captured from a camera. Therefore it is safe to assume that the images are always the same size.

    Allocations of cv::Mat have a non-negligible overhead, so in this case it would be beneficial to reuse the channel Mats. (i.e. allocate the destination images when you receive the first frame, and then just overwrite the contents for subsequent frames)

    The additional benefit of this approach is (quite likely) reducing memory fragmentation. This can become a real problem for 32bit code.

  2. You mention that you're interested in only one specific channel (which the user may select arbitrarily). That means you could use cv::mixChannels, which gives you the flexibility in selecting what channels and how you want to extract them.

    That means you can extract data for only a single channel, theoretically (depending on the implementation -- study the source code for more details) avoiding the overhead in extracting and/or copying the data for the channels you're not interested in.


Let's make a test program evaluating the 4 possible combinations of the approaches outlined above.

  • Variant 0: cv::split without reuse
  • Variant 1: cv::split with reuse
  • Variant 2: cv::mixChannels without reuse
  • Variant 3: cv::mixChannels with reuse

NB: I just use static for simplicity here, usually i'd make this member variable in a class that wraps the algorithm.


#include <opencv2/opencv.hpp>

#include <chrono>
#include <cstdint>
#include <iostream>
#include <vector>

#define SELECTED_CHANNEL 1

cv::Mat variant_0(cv::Mat const& img)
{
    std::vector<cv::Mat> channels;
    cv::split(img, channels);
    return channels[SELECTED_CHANNEL];
}

cv::Mat variant_1(cv::Mat const& img)
{
    static std::vector<cv::Mat> channels;
    cv::split(img, channels);
    return channels[SELECTED_CHANNEL];
}

cv::Mat variant_2(cv::Mat const& img)
{
    // NB: output Mat must be preallocated
    cv::Mat channel(img.rows, img.cols, CV_8UC1);
    int from_to[] = { SELECTED_CHANNEL, 0 };
    cv::mixChannels(&img, 1, &channel, 1, from_to, 1);
    return channel;
}

cv::Mat variant_3(cv::Mat const& img)
{
    // NB: output Mat must be preallocated
    static cv::Mat channel(img.rows, img.cols, CV_8UC1);
    int from_to[] = { SELECTED_CHANNEL, 0 };
    cv::mixChannels(&img, 1, &channel, 1, from_to, 1);
    return channel;
}

template<typename T>
void timeit(std::string const& title, T f)
{
    using std::chrono::high_resolution_clock;
    using std::chrono::duration_cast;
    using std::chrono::microseconds;

    cv::Mat img(1024,1024, CV_8UC3);
    cv::randu(img, 0, 256);

    int32_t const STEPS(1024);

    high_resolution_clock::time_point t1 = high_resolution_clock::now();
    for (uint32_t i(0); i < STEPS; ++i) {
        cv::Mat result = f(img);
    }
    high_resolution_clock::time_point t2 = high_resolution_clock::now();

    auto duration = duration_cast<microseconds>(t2 - t1).count();
    double t_ms(static_cast<double>(duration) / 1000.0);
    std::cout << title << "\n"
        << "Total = " << t_ms << " ms\n"
        << "Iteration = " << (t_ms / STEPS) << " ms\n"
        << "FPS = " << (STEPS / t_ms * 1000.0) << "\n"
        << "\n";
}

int main()
{
    for (uint8_t i(0); i < 2; ++i) {
        timeit("Variant 0", variant_0);
        timeit("Variant 1", variant_1);
        timeit("Variant 2", variant_2);
        timeit("Variant 3", variant_3);
        std::cout << "--------------------------\n\n";
    }

    return 0;
}

Output for the second pass (so we avoid any warmup costs).

Note: Running this on i7-4930K, using OpenCV 3.1.0 (64-bit, MSVC12.0), Windows 10 -- YMMV, especially with CPUs that have AVX2

Variant 0
Total = 1518.69 ms
Iteration = 1.48309 ms
FPS = 674.267

Variant 1
Total = 359.048 ms
Iteration = 0.350633 ms
FPS = 2851.99

Variant 2
Total = 820.223 ms
Iteration = 0.800999 ms
FPS = 1248.44

Variant 3
Total = 427.089 ms
Iteration = 0.417079 ms
FPS = 2397.63

Interestingly, cv::split with reuse wins here. Feel free to edit the answer and add timings from different platforms/CPU generations (especially if the proportions differ radically).

It also seems that with my setup, none of this is parallelized quite well, so that may be another possible path at speeding this up (something like cv::parallel_for_).

Upvotes: 5

Related Questions