Hugh Warden
Hugh Warden

Reputation: 446

Efficiently mask an image with a label mask

I have an image that I read in with tifffile.imread and it is turned into a 3D matrix, with the first dimension representing the Y coordinate, the second the X and the third the channel of the image (these images are not RGB and so there can be an arbitrary number of channels).

Each of these images has a label mask which is a 2D array that indicates the position of objects in the image. In the label mask, pixels that have a value of 0 do not belong to any object, pixels that have a value of 1 belong to the first object, pixels that have a value of 2 belong to the second object and so on.

What I would like to calculate is for each object and for each channel of the image I would like to know the mean, median, std, min and max of the channel. So, for example, I would like to know the mean, mediam std, min and max values of the first channel for pixels in object 10.

I have written code to do this but it is very slow (shown below) and I wondered if people had a better way or knew a package(s) that might be helpful i making this faster/doing this more efficiently. (Here the word 'stain' means the same as channel)

sample = imread(input_img)
label_mask = np.load(input_mask)

n_stains = sample.shape[2]
n_labels = np.max(label_mask)

#Create empty dataframe to store intensity measurements
intensity_measurements = pd.DataFrame(columns = ['sample', 'label', 'stain', 'mean', 'median', 'std', 'min', 'max'])

for label in range(1, n_labels+1):
    for stain in range(n_stains):
        #Extract stain and label
        stain_label = sample[:,:,stain][label_mask == label]

        #Calculate intensity measurements
        mean = np.mean(stain_label)
        median = np.median(stain_label)
        std = np.std(stain_label)
        min = np.min(stain_label)
        max = np.max(stain_label)

        #Add intensity measurements to dataframe
        intensity_measurements = intensity_measurements.append({'sample' : args.input_img, 'label': label, 'stain': stain, 'mean': mean, 'median': median, 'std': std, 'min': min, 'max': max}, ignore_index=True)

Upvotes: 2

Views: 2002

Answers (3)

Hugh Warden
Hugh Warden

Reputation: 446

I've found a good solution that works for me using scikit image, specifically the regionprops functions.

import numpy as np
import pandas as pd
from skimage.measure import regionprops, regionprops_table
np.random.seed(42)

Here is a random "image" and label mask of that image

img = np.random.randint(0, 255, size=(100, 100, 3))
mask = np.zeros((100, 100)).astype(np.uint8)
mask[20:50, 20:50] = 1
mask[65:70, 65:70] = 2

There is already an inbuilt function for measuring the mean intensity for each channel that is very fast

pd.DataFrame(regionprops_table(mask, img, properties=['label', 'mean_intensity']))

You can also pass custom functions that take a binary mask and one channel of an intensity image to regionprops_table

def my_mean_func(mask, img):
    return np.mean(img[mask])

pd.DataFrame(regionprops_table(mask, img, properties=['label'], extra_properties=[my_mean_func]))

This is fast because the binary mask and intensity image passed to the custom function is the minimum bounding box of the mask. Therefore, the computations are much faster as they are operating over a much smaller area.

This only allows the user to calculate values per channel, but there is a generalisation that returns a 3D matrix of the selected region so that between channel measurements (or any measurements you like can be made).

props = regionprops(mask, img)

for prop in props:
    print("Region ", prop['label'], ":")
    print("Mean intensity: ", prop['mean_intensity'])
    print()

This is only an example of the very basic functionality.

I haven't had time to benchmark any of the above algorithms, but the ones used in this answer are very very fast indeed and I use them to operate over very large images quite quickly. However, it is important to note here that one of the reasons why this is so much faster for me is because I expect each object (each entry of the label mask that has the same value) to be only situated in a very small part of the image. Therefore, the minimum bounding box representation returned by regionprops is much much smaller than the original image and drastically speeds up computation.

Thank you very much to everyone for their help.

Upvotes: 0

Raibek
Raibek

Reputation: 576

The proposed method below utilizes matrix multiplications in order to speed up the calculations.
It is built on two crucial Numpy tools:

  1. https://numpy.org/doc/stable/reference/generated/numpy.einsum.html?highlight=einsum#numpy.einsum

Evaluates the Einstein summation convention on the operands.

  1. https://numpy.org/doc/stable/reference/maskedarray.html

Masked arrays are arrays that may have missing or invalid entries. The numpy.ma module provides a nearly work-alike replacement for numpy that supports data arrays with masks.

masked array update: The initial code was updated with the masked array use after https://stackoverflow.com/users/7328782/cris-luengo spotted a mistake in my intial code.

This replaces all the non-selected pixels for a given label with a 0 value, and includes all those zeros into the measurements.

Now we mask the non-selected pixels before measurement calculations.

import numpy as np
import numpy.ma as ma
import pandas as pd

sample = imread(input_img)
label_mask = np.load(input_mask)

n_labels = np.max(label_mask)

# let's create boolean label masks for each label 
# producing 3D matrix where 1st axis is label
label_mask_unraveled = np.equal.outer(label_mask, np.arange(1, n_labels +1))

# now we can apply these boolean label masks simultaniously
# to all the sample channels with help of 'einsum' producing 4D matrix, 
# where the 1st axis is channel/stain and the 2nd axis is label
sample_label_masks_applied = np.einsum("ijk,ijl->klij", sample, label_mask_unraveled)

# in order to exclude the non-selected pixels 
# from meausurement calculations, we mask the pixels first
non_selected_pixels_mask = np.moveaxis(~label_mask_unraveled, -1, 0)[np.newaxis, :, :, :]
non_selected_pixels_mask = np.repeat(non_selected_pixels_mask, sample.shape[2], axis=0)

sample_label_masks_applied = ma.masked_array(sample_label_masks_applied, non_selected_pixels_mask)    

# intensity measurement calculations
# embedded into pd.DataFrame initialization
intensity_measurements = pd.DataFrame(
    {
        "sample": args.input_img,
        "label": sample.shape[2] * list(range(1, n_labels+1)),
        "stain": n_labels * list(range(sample.shape[2])),
        "mean": ma.mean(sample_label_masks_applied, axis=(2, 3)).flatten(),
        "median": ma.median(sample_label_masks_applied, axis=(2, 3)).flatten(),
        "std": ma.std(sample_label_masks_applied, axis=(2, 3)).flatten(),
        "min": ma.min(sample_label_masks_applied, axis=(2, 3)).flatten(),
        "max": ma.max(sample_label_masks_applied, axis=(2, 3)).flatten() 
    }
)

Upvotes: 0

Cris Luengo
Cris Luengo

Reputation: 60615

Your code is slow because you iterate over the whole image for each of the labels. This is an operation of O(n k), for n pixels and k labels. You could instead iterate over the image, and for each pixel examine the label, then update the measurements for that label with the pixel values. This is an operation of O(n). You'd keep an accumulator for each label and each measurement (standard deviation requires accumulating the square sum as well as the sum, but the sum you're already accumulating for the mean). The only measure that you cannot compute this way is the median, as it requires a partial sort of the full list of values.

This would obviously be a much cheaper operation, except for the fact that Python is a slow, interpreted language, and looping over each pixel in Python leads to a very slow program. In a compiled language you would implement it this way though.

See this answer for a way to implement this efficiently using NumPy functionality.


Using the DIPlib library (disclosure: I'm an author) you can apply the operation as follows (the median is not implemented). Other image processing libraries have similar functionality, though might not be as flexible with the number of channels.

import diplib as dip

# sample = imread(input_img)
# label_mask = np.load(input_mask)
# Alternative random data so that I can run the code for testing:
sample = imageio.imread("../images/trui_c.tif")
label_mask = np.random.randint(0, 20, sample.shape[:2], dtype=np.uint32)

sample = dip.Image(sample, tensor_axis=2)
msr = dip.MeasurementTool.Measure(label_mask, sample, features=["Mean", "StandardDeviation", "MinVal", "MaxVal"])
print(msr)

This prints out:

   |                                 Mean |                    StandardDeviation |                               MinVal |                               MaxVal |
-- | ------------------------------------ | ------------------------------------ | ------------------------------------ | ------------------------------------ |
   |      chan0 |      chan1 |      chan2 |      chan0 |      chan1 |      chan2 |      chan0 |      chan1 |      chan2 |      chan0 |      chan1 |      chan2 |
   |            |            |            |            |            |            |            |            |            |            |            |            |
-- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
 1 |      82.26 |      41.30 |      24.77 |      57.77 |      52.16 |      48.22 |      5.000 |      3.000 |      1.000 |      255.0 |      255.0 |      255.0 |
 2 |      82.02 |      41.18 |      24.85 |      52.16 |      48.22 |      48.33 |      3.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
 3 |      82.39 |      41.17 |      24.93 |      48.22 |      48.33 |      48.48 |      1.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
 4 |      82.14 |      41.62 |      25.03 |      48.33 |      48.48 |      48.47 |      1.000 |      1.000 |      0.000 |      255.0 |      255.0 |      255.0 |
 5 |      82.89 |      41.45 |      24.94 |      48.48 |      48.47 |      48.54 |      1.000 |      0.000 |      1.000 |      255.0 |      255.0 |      255.0 |
 6 |      82.83 |      41.60 |      25.26 |      48.47 |      48.54 |      48.65 |      0.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
 7 |      81.95 |      41.77 |      25.51 |      48.54 |      48.65 |      48.22 |      1.000 |      1.000 |      2.000 |      255.0 |      255.0 |      255.0 |
 8 |      82.93 |      41.36 |      25.19 |      48.65 |      48.22 |      48.11 |      1.000 |      2.000 |      1.000 |      255.0 |      255.0 |      255.0 |
 9 |      81.88 |      41.70 |      25.07 |      48.22 |      48.11 |      47.69 |      2.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
10 |      81.46 |      41.40 |      24.82 |      48.11 |      47.69 |      48.32 |      1.000 |      1.000 |      2.000 |      255.0 |      255.0 |      255.0 |
11 |      81.33 |      40.98 |      24.76 |      47.69 |      48.32 |      48.85 |      1.000 |      2.000 |      1.000 |      255.0 |      255.0 |      255.0 |
12 |      82.30 |      41.55 |      25.12 |      48.32 |      48.85 |      48.75 |      2.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
13 |      82.43 |      41.50 |      25.15 |      48.85 |      48.75 |      48.89 |      1.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
14 |      83.29 |      42.11 |      25.65 |      48.75 |      48.89 |      48.32 |      1.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
15 |      83.20 |      41.64 |      25.28 |      48.89 |      48.32 |      48.13 |      1.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
16 |      81.51 |      40.92 |      24.76 |      48.32 |      48.13 |      48.73 |      1.000 |      1.000 |      1.000 |      255.0 |      255.0 |      255.0 |
17 |      81.81 |      41.31 |      24.71 |      48.13 |      48.73 |      48.49 |      1.000 |      1.000 |      0.000 |      255.0 |      255.0 |      255.0 |
18 |      83.58 |      41.85 |      25.25 |      48.73 |      48.49 |      32.20 |      1.000 |      0.000 |      1.000 |      255.0 |      255.0 |      212.0 |
19 |      82.12 |      41.24 |      25.06 |      48.49 |      32.20 |      24.44 |      0.000 |      1.000 |      1.000 |      255.0 |      212.0 |      145.0 |

I don't have an efficient solution for the median. You'd have to split the image into a separate array for each label, then run the median over that. This would be equally efficient as the above, but use up much more memory.

Upvotes: 2

Related Questions