Reputation: 495
I've downloaded apple's truedepth streamer example and am trying to add a compute pipeline. I think I'm retrieving the results of the computation but am not sure as they all seem to be zero.
I'm a beginner at iOS development so there maybe quite a few mistakes so please bear with me!
The pipeline set up: (i wasn't quite sure how to create the resultsbuffer, since the kernel outputs a float3)
int resultsCount = CVPixelBufferGetWidth(depthFrame) * CVPixelBufferGetHeight(depthFrame);
//because I will be output 3 floats for each value in depthframe
id<MTLBuffer> resultsBuffer = [self.device newBufferWithLength:(sizeof(float) * 3 * resultsCount) options:MTLResourceOptionCPUCacheModeDefault];
_threadgroupSize = MTLSizeMake(16, 16, 1);
// Calculate the number of rows and columns of threadgroups given the width of the input image
// Ensure that you cover the entire image (or more) so you process every pixel
_threadgroupCount.width = (inTexture.width + _threadgroupSize.width - 1) / _threadgroupSize.width;
_threadgroupCount.height = (inTexture.height + _threadgroupSize.height - 1) / _threadgroupSize.height;
// Since we're only dealing with a 2D data set, set depth to 1
_threadgroupCount.depth = 1;
id<MTLComputeCommandEncoder> computeEncoder = [commandBuffer computeCommandEncoder];
[computeEncoder setComputePipelineState:_computePipelineState];
[computeEncoder setTexture: inTexture atIndex:0];
[computeEncoder setBuffer:resultsBuffer offset:0 atIndex:1];
[computeEncoder setBytes:&intrinsics length:sizeof(intrinsics) atIndex:0];
[computeEncoder dispatchThreadgroups:_threadgroupCount
threadsPerThreadgroup:_threadgroupSize];
[computeEncoder endEncoding];
// Finalize rendering here & push the command buffer to the GPU
[commandBuffer commit];
//for testing
[commandBuffer waitUntilCompleted];
I have added the following compute kernel:
kernel void
calc(texture2d<float, access::read> inTexture [[texture(0)]],
device float3 *resultsBuffer [[buffer(1)]],
constant float3x3& cameraIntrinsics [[ buffer(0) ]],
uint2 gid [[thread_position_in_grid]])
{
float val = inTexture.read(gid).x * 1000.0f;
float xrw = (gid.x - cameraIntrinsics[2][0]) * val / cameraIntrinsics[0][0];
float yrw = (gid.y - cameraIntrinsics[2][1]) * val / cameraIntrinsics[1][1];
int vertex_id = ((gid.y * inTexture.get_width()) + gid.x);
resultsBuffer[vertex_id] = float3(xrw, yrw, val);
}
Code for seeing buffer result: (I tried two different ways and both are outputting all zeroes at the moment)
void *output = [resultsBuffer contents];
for (int i = 0; i < 10; ++i) {
NSLog(@"value is %f", *(float *)(output) ); //= *(float *)(output + 4 * i);
}
NSData *data = [NSData dataWithBytesNoCopy:resultsBuffer.contents length:(sizeof(float) * 3 * resultsCount)freeWhenDone:NO];
float *finalArray = new float [resultsCount * 3];
[data getBytes:&finalArray[0] length:sizeof(finalArray)];
for (int i = 0; i < 10; ++i) {
NSLog(@"here is output %f", finalArray[i]);
}
Upvotes: 1
Views: 804
Reputation: 31782
I see a couple of problems here, but neither of them are related to your Metal code per se.
In your first output loop, as written, you're just printing the first element of the results buffer 10 times. The first element may legitimately be 0, leading you to believe all of the results are zero. But when I changed the first log line to
NSLog(@"value is %f", ((float *)output)[i]);
I saw different values printed when running your kernel on a test image.
The other issue is related to your getBytes:length:
call. You want to pass the number of bytes to copy, but sizeof(finalArray)
is actually the size of the finalArray
pointer, i.e., 4 bytes, not the total size of the buffer it points to. This is an extremely common error in C and C++ code.
Instead, you can use the same byte count as the one you used when allocating space:
[data getBytes:&finalArray[0] length:(sizeof(float) * 3 * resultsCount)];
You should then find that you get the same (non-zero) values printed as in the previous step.
Upvotes: 2