Reputation: 661
I have a large stack of 800 16bit gray scale images with 2048x2048px. They are read from a single BigTIFF file and the whole stack barely fits into my RAM (8GB). Now I need do a median projection. That means I want to compute the median of each pixel across all 800 frames. The Matlab median function fails because there is not enough memory left make a copy of the whole array for the function call. What would be an efficient way to compute the median?
I have tried using a for
loop to compute the median one pixel at a time, but this is still terribly slow.
Upvotes: 5
Views: 607
Reputation: 921
If you have access to the Image Processing Toolbox, Matlab has a set of tool to handle large images called Blockproc
From the docs :
To avoid these problems, you can process large images incrementally: reading, processing, and finally writing the results back to disk, one region at a time. The blockproc function helps you with this process.
Upvotes: 1
Reputation: 11795
I bet that the distributions of the individual pixel values over the stack (i.e. the histograms of the pixel jets) are sparse.
If that's the case, the amount of memory needed to keep all the pixel histograms is much less than 2K x 2K x 64k: you can use a compact hash map to represent each histogram, and update them loading the images one at a time. When all updates are done, you go through your histograms and compute the median of each.
Upvotes: 1
Reputation: 2993
I will try my best to provide help (if any), because I don't have an 800-stack TIFF image, nor an 8GB computer, but I want to see if my thinkings can form a solution.
First, 800*2048*2048*8bit = 3.2GB, not including the headers. With your 8GB RAM it should not be too difficult to store it at once; there might be too many programs running and chopping up the contiguous memories. Anyway, let's treat the problem as Matlab can't load it as a whole into the memory.
As Jonas suggests, imread
supports loading a TIFF image by index. It also supports a PixelRegion
parameter, so you can also consider accessing parts of the image by this parameter if you want to utilize Shai's idea.
I came up with a median algo that doesn't use all the data at the same time; it barely scans through a sequence of un-ordered data, one at each time; but it does keep a memory of 256 counters.
_
data = randi([0,255], 1, 800);
bins = num2cell(zeros(256,1,'uint16'));
for ii = 1:800
bins{data(ii)+1} = bins{data(ii)+1} + 1;
end
% clearvars data
s = cumsum(cell2mat(bins));
if find(s==400)
med = ( find(s==400, 1, 'first') + ...
find(s>400, 1, 'first') ) /2 - 1;
else
med = find(s>400, 1, 'first') - 1;
end
_
It's not very efficient, at least because it uses a for
loop. But the benefit is instead of keeping 800 raw data in memory, only 256 counters are kept; but the counters need uint16
, so actually they are roughly equivalent to 512 raw data. But if you are confident that for any pixel the same grayscale level won't count for more than 255 times among the 800 samples, you can choose uint8
, and hence reduce the memory by half.
The above code is for one pixel. I'm still thinking how to expand it to a 2048x2048
version, such as
for ii = 1:800
img_data = randi([0,255], 2048, 2048);
(do stats stuff)
end
By doing so, for each iteration, you only need these kept in memory: One frame of image; A set of counters; A few supplemental variables, with size comparable to one frame of image.
bins = num2cell(zeros(....
Upvotes: 0
Reputation: 74940
Iterating over blocks, as @Shai suggests, may be the most straightforward solution. If you do have this problem frequently, you may want to consider converting the image to a mat-file, so that you can access the pixels as n-d array directly from disk.
%# convert to mat file
matObj = matfile('dest.mat','w');
matObj.data(2048,2048,numSlices) = 0;
for t = 1:numSlices
matObj.data(:,:,t) = imread(tiffFile,'index',t);
end
%# load a block of the matfile to take median (run as part of a loop)
medianOfBlock = median(matObj.data(1:128,1:128,:),3);
Upvotes: 4