merrick_s
merrick_s

Reputation: 59

OpenGL GLPaint threaded rendering

I am currently using a library that is based on the GLPaint example by Apple used for drawing on screen in Open GL. Currently whenever the canvas saves and restores the session the lines are drawn (progress is visibly seen) and it takes quite a bit of time if there are a lot of points to render. Is there any way to get this to render in parallel or quicker?

This is the drawing code I'm using:

CGPoint start = step.start;
CGPoint end = step.end;

// Convert touch point from UIView referential to OpenGL one (upside-down flip)
CGRect bounds = [self bounds];
start.y = bounds.size.height - start.y;
end.y = bounds.size.height - end.y;

static GLfloat*     vertexBuffer = NULL;
static NSUInteger   vertexMax = 64;
NSUInteger          vertexCount = 0,
count,
i;

[EAGLContext setCurrentContext:context];
glBindFramebufferOES(GL_FRAMEBUFFER_OES, viewFramebuffer);

// Convert locations from Points to Pixels
CGFloat scale = self.contentScaleFactor;
start.x *= scale;
start.y *= scale;
end.x *= scale;
end.y *= scale;

// Allocate vertex array buffer
if(vertexBuffer == NULL)
    vertexBuffer = malloc(vertexMax * 2 * sizeof(GLfloat));

// Add points to the buffer so there are drawing points every X pixels
count = MAX(ceilf(sqrtf((end.x - start.x) * (end.x - start.x) + (end.y - start.y) * (end.y - start.y)) / kBrushPixelStep), 1);
for(i = 0; i < count; ++i) {
    if(vertexCount == vertexMax) {
        vertexMax = 2 * vertexMax;
        vertexBuffer = realloc(vertexBuffer, vertexMax * 2 * sizeof(GLfloat));
    }

    vertexBuffer[2 * vertexCount + 0] = start.x + (end.x - start.x) * ((GLfloat)i / (GLfloat)count);
    vertexBuffer[2 * vertexCount + 1] = start.y + (end.y - start.y) * ((GLfloat)i / (GLfloat)count);
    vertexCount += 1;
}

// Render the vertex array
glVertexPointer(2, GL_FLOAT, 0, vertexBuffer);
glDrawArrays(GL_POINTS, 0, (int)vertexCount);

// Display the buffer
glBindRenderbufferOES(GL_RENDERBUFFER_OES, viewRenderbuffer);
[context presentRenderbuffer:GL_RENDERBUFFER_OES];

Upvotes: 1

Views: 166

Answers (2)

Reto Koradi
Reto Koradi

Reputation: 54572

I believe the dialog in the comments above revealed the main part of your performance problem. Unless I completely misunderstood it, the high level structure of your code currently looks like this:

loop over steps
    calculate list of points from start/end points
    render list of points
    present the renderbuffer
end loop

It should be massively faster to present the renderbuffer only after all the steps were rendered:

loop over steps
    generate list of points from start/end points
    draw list of points
end loop
present the renderbuffer

Even better, generate a Vertex Buffer Object (aka VBO) for each step as part of creating it, and store the coordinates of the points for the step in the buffer. Then your draw logic becomes:

loop over steps
    bind VBO for step
    draw content of VBO
end loop
present the renderbuffer

Upvotes: 0

Duncan C
Duncan C

Reputation: 131398

OpenGL is not multi-threaded. You have to submit OpenGL commands from a single thread.

You have a couple of choices:

  1. You can factor your code to use concurrency to build the data that you send to OpenGL, then submit it to the OpenGL API once it is all available.

  2. You can refactor it to do your calculations using shaders. This pushes the computation off the CPU and onto the GPU, which is highly optimized for parallel operation.

Your code above is using realloc to reallocate a buffer repeatedly while in the for loop. This is dreadfully inefficient, since memory allocation is one of the slowest RAM-based operations on a modern OS. You should refactor your code to calculate the final size of your memory buffer up-front, and then allocate the buffer at it's final size once, and not use realloc at all. This should give you a many-times increase in speed with very little effort.

Glancing at your code it should not be hard at all to refactor your for loop to break the vertex calculation into blocks and submit those blocks to GCD for concurrent processing. The trick is in breaking the tasks into work units that are large enough to benefit from parallel processing (there is a certain amount of overhead in setting up a task to run in a background queue. You want to do enough work in each work unit to make that overhead worth it.)

Upvotes: 1

Related Questions