Reputation: 121
I have one array of uint8_t values, my goal is to copy every 3 bytes to a dst array, but the catch is that I iterate from 4 to 4 bytes in the dst array, just like shown below.
src = {1,2,3,4,5,6};
dst = {0,0,0,0,0,0,0,0};
...
dst = {1,2,3,0,4,5,6,0}
For now i'm using the following code to perform this task.
for(int i=0; i<arr_size ; i++)
memcpy(dst + i*4, arr_ptr + i*3, 3);
Is there are faster/efficient way to do this?
Edit for more context:
I have the following struct that's needed to be populated with data from an image array, where a
will always be initialized with 0.
typedef struct {unsigned char r,g,b,a} uchar4;
...
// init dst
...
*dst = (uchar4 *)malloc(height * width * sizeof(uchar4));
By assigning values to the uchar4 array doing struct.variable = value
, takes a lot of time, which led me to think that it would be faster to just copy the values from the image array, that stores uint8_t values, to the uchar4 array, since uchar and uint8 occupy 1 byte in memory. This way the structs array is initialized with 0's and every 3 bytes from the flattened image is paste every 4 bytes in the uchar arr.
Edit2: code corrections
Upvotes: 1
Views: 1077
Reputation: 145277
There are many ways to try and optimize your conversion loop. As recommended by 0___________, you should consider memcpy
for sized sized chunks as most optimizers will generate very efficient code for the target platform, beating hand coded naive alternatives.
Here is a quick benchmark comparing 3 methods:
memcpy
for this copyOther methods can be added, such as trying to take advantage of SIMD instructions, which should provide significant performance improvements at the expense of portability.
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
typedef struct rgb {
uint8_t r, g, b;
} rgb;
typedef struct rgba {
uint8_t r, g, b, a;
} rgba;
void copy3to4_simple(void *to, const void *from, size_t count) {
const uint8_t *src = from;
uint8_t *dst = to;
uint8_t *end = dst + count * 4;
while (dst < end) {
dst[0] = src[0];
dst[1] = src[1];
dst[3] = src[2];
dst += 4;
src += 3;
}
}
void copy3to4_memcpy(void *to, const void *from, size_t count) {
const uint8_t *src = from;
uint8_t *dst = to;
for (size_t i = 0; i < count; i++) {
memcpy(dst + i * 4, src + i * 3, 3);
}
}
int main() {
int width = 1920, height = 1080;
rgb *src = calloc(sizeof(*src), width * height);
rgba *dst = calloc(sizeof(*dst), width * height);
const char *name[10];
clock_t c[10];
int n = 0;
#define RUNS 100
name[n] = "simple";
for (int i = 0; i < RUNS + 10; i++) {
if (i == 10)
c[n] = -clock();
copy3to4_simple(dst, src, width * height);
}
c[n++] += clock();
name[n] = "memcpy";
for (int i = 0; i < RUNS + 10; i++) {
if (i == 10)
c[n] = -clock();
copy3to4_memcpy(dst, src, width * height);
}
c[n++] += clock();
for (int i = 0; i < n; i++) {
printf("%s: %.3f msec\n", name[i], c[i] * 1000. / CLOCKS_PER_SEC / RUNS);
}
free(src);
free(dst);
return 0;
}
Running this on my old Macbook, I get this:
simple: 2.478 msec
memcpy: 1.840 msec
memcpy
beats simple
by 25%, but you might get a different result on a different architecture.
Upvotes: 2
Reputation: 67835
I assume that arr_size
is the number of triplets to be copied.
for(size_t i=0; i<arr_size ; i++)
memcpy(dst + i*3, src + i*4, 3);
it is wrong it has to be
for(size_t i=0; i<arr_size ; i++)
memcpy(dst + i*4, src + i*3, 3);
Now the context.
typedef struct {unsigned char r,g,b,a} uchar4;
It is not guaranteed that the compiler will not add any padding. And any pointer punning might not work correctly. Add static assert to check the size of your structure is 4, if not use you will need to use some compiler extension to pack the struct.
Efficiency:
It is very difficult to judge but the trivial functions with code from the answers here show that it is very likely the memcpy
version to be the most efficient.
I have tried to remove one memory access and wrote this very bad (it invokes UBs! in general, but it will work on X86 and Cortex-M3 and newer). It was done just for the curiosity: ( WARNING!! Graphic programming content!!! Not suitable for all audience)https://godbolt.org/z/Pefc6T
Upvotes: 2
Reputation: 5893
You don't need memcpy
at all. Just with pointer arithmetic, you could do the following:
uint8_t *src = some_values;
uint8_t *end = src + some_values_size;
uint8_t *dst = some_buffer;
for (; src < end; src += 3, dst += 4) {
dst[0] = src[0];
dst[1] = src[1];
dst[2] = src[2];
}
With the example above, you could define the code as a macro and use it for different datatypes. memcpy
wants to know how many bytes it has to copy, therefore you need a type.
Note: The code assumes that the length of the array src
is a multiple of 3 and the length of the array dst
equals to: (length(src) / 3) * 4
.
Upvotes: 0