spore234
spore234

Reputation: 3640

remove pixel annotations in dicom image

I am analyzing medical images. All images have a marker with the position. It looks like this enter image description here

It is the "TRH RMLO" annotation in this image, but it can be different in other images. Also the size varies. The image is cropped but you see that the tissue is starting on the right side. I found that the presence of these markers distort my analysis.

How can I remove them?

I load the image in python like this

import dicom
import numpy as np

img = dicom.read_file(my_image.dcm)
img_array = img.pixel_array

The image is then a numpy array. The white text is always surrounded by a large black area (black has value zero). The marker is in a different position in each image.

How can I remove the white text without hurting the tissue data.

UPDATE

added a second image

enter image description here

UPDATE2: Here are two of the original dicom files. All personal information has been removed.edit:removed

Upvotes: 5

Views: 4298

Answers (6)

mas
mas

Reputation: 1

Simpler one is still possible!!!.

Just implement following after (img_array = img.pixel_array)

img_array[img_array > X] = Y

In which X is the intensity threshold you want to eliminate after that. Also Y is the intensity value which you want to consider instead of that.

For example: img_array[img_array > 4000] = 0

Replace white matter greater than 4000 with black intensity 0.

Upvotes: -1

Jeru Luke
Jeru Luke

Reputation: 21203

I have another idea. This solution is in OpenCV using python. It is a rather solution.

  1. First, obtain the binary threshold of the image.

    ret,th = cv2.threshold(img,2,255, 0) enter image description here

  2. Perform morphological dilation:

dilate = cv2.morphologyEx(th, cv2.MORPH_DILATE, kernel, 3) enter image description here

  1. To join the gaps, I then used median filtering:

median = cv2.medianBlur(dilate, 9) enter image description here

Now you can use the contour properties to eliminate the smallest contour and retain the other containing the image.

It also works for the second image:

enter image description here

Upvotes: 2

Mark Setchell
Mark Setchell

Reputation: 207375

Looking at the actual pixel values of the image you supplied, you can see that the marker is almost (99.99%) pure white and this doesn't occur elsewhere in the image so you can isolate it with a simple 99.99% threshold.

I prefer ImageMagick at the command-line, so I would do this:

convert sample.dcm -threshold 99.99% -negate mask.png

enter image description here

convert sample.dcm mask.png -compose darken -composite result.jpg

enter image description here

Of course, if the sample image is not representative, you may have to work harder. Let's look at that...

If the simple threshold doesn't work for your images, I would look at "Hit and Miss Morphology". Basically, you threshold your image to pure black and white - at around 90% say, and then you look for specific shapes, such as the corner markers on the label. So, if we want to look for the top-left corner of a white rectangle on a black background, and we use 0 to mean "this pixel must be black", 1 to mean "this pixel must be white" and - to mean "we don't care", we would use this pattern:

0 0 0 0 0
0 1 1 1 1
0 1 - - -
0 1 - - -
0 1 - - -

Hopefully you can see the top left corner of a white rectangle there. That would be like this in the Terminal:

convert sample.dcm -threshold 90% \
  -morphology HMT '5x5:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' result.png

Now we also want to look for top-right, bottom-left and bottom-right corners, so we need to rotate the pattern, which ImageMagick handily does when you add the > flag:

convert sample.dcm -threshold 90% \
   -morphology HMT '5x5>:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' result.png

enter image description here

Hopefully you can see dots demarcating the corners of the logo now, so we could ask ImageMagick to trim the image of all extraneous black and just leave the white dots and then tell us the bounding box:

cconvert sample.dcm -threshold 90% \
   -morphology HMT '5x5>:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' -format %@ info:
308x198+1822+427

So, if I now draw a red box around those coordinates, you can see where the label has been detected - of course in practice I would draw a black box to cover it but I am explaining the idea:

convert sample.dcm -fill "rgba(255,0,0,0.5)" -draw "rectangle 1822,427 2130,625" result.png

enter image description here

If you want a script to do that automagically, I would use something like this, saving it as HideMarker:

#!/bin/bash
input="$1"
output="$2"

# Find corners of overlaid marker using Hit and Miss Morphology, then get crop box
IFS="x+" read w h x1 y1 < <(convert "$input" -threshold 90% -morphology HMT '5x5>:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' -format %@ info:)

# Calculate bottom-right corner from top-left and dimensions
((x1=x1-1))
((y1=y1-1))
((x2=x1+w+1))
((y2=y1+h+1))
convert "$input" -fill black -draw "rectangle $x1,$y1 $x2,$y2" "$output"

Then you would do this to make it executable:

chmod +x HideMarker

And run it like this:

./HideMarker someImage.dcm  result.png

Upvotes: 5

user3216191
user3216191

Reputation: 123

I am sure this can be optimized, but ... You could create 4 patches of size 3x3 or 4x4, and initialize them with the exact content of the pixel values for each of the individual corners of the frame surrounding the annotation text. You could then iterate over the whole image (or have some smart initialization looking only in the black area) and find the exact match for those patches. It is not very likely you will have the same regular structure (90 deg corner surrounded by near 0) in the tissue, so this might give you the bounding box.

Upvotes: 0

cneller
cneller

Reputation: 1582

If these annotations are in the DICOM file there are a couple ways they could be stored (see https://stackoverflow.com/a/4857782/1901261). The currently supported method can be cleaned off by simply removing the 60xx group attributes from the files.

For the deprecated method (which is still commonly used) you can clear out the unused high bit annotations manually without messing up the other image data as well. Something like:

int position = object.getInt( Tag.OverlayBitPosition, 0 );
if( position == 0 ) return;

int bit = 1 << position;
int[] pixels = object.getInts( Tag.PixelData );
int count = 0;
for( int pix : pixels )
{
   int overlay = pix & bit;
   pixels[ count++ ] = pix - overlay;
}
object.putInts( Tag.PixelData, VR.OW, pixels );

If these are truly burned into the image data, you're probably stuck using one of the other recommendations here.

Upvotes: 1

sascha
sascha

Reputation: 33522

The good thing is, that these watermarks are probably in an isolated totally black are which makes it easier (although it's questionable if removing this is according to the indicated usage; license-stuff).

Without beeing an expert, here is one idea. It might be a sketch of some very very powerful approach tailored to this problem but you have to decide if implementation-complexity & algorithmic-complexity (very dependent on image-statistics) are worth it:

Basic idea

  • Detect the semi-cross like borders (4)
  • Calculate the defined rectangle from these
  • Black-out this rectangle

Steps

0

Binarize

1

  • Use some gradient-based edge-detector to get all the horizontal edges
  • There may be multiple; you can try to give min-length (maybe some morphology needed to connect pixels which are not connected based on noise in source or algorithm)

2

  • Use some gradient-based edge-detector to get all the horizontal edges
  • Like the above, but a different orientation

3

  • Do some connected-component calculation to get some objects which are vertical and horizontal lines

  • Now you can try different chosings of candidate-components (8 real ones) with the following knowledge

    • two of these components can be described by the same line (slope-intercept form; linear regression problem) -> line which borders the rectangle
    • it's probably that the best 4 pair-chosings (according to linear-regression loss) are the valid borders of this rectangle
    • you might add the assumption, that vertical borders and horizontal borders are orthogonal to each other

4 - Calculate the rectangle from these borders - Widen it by a few pixels (hyper-parameter) - Black-out that rectangle

That's the basic approach.

Alternative

This one is much less work, use more specialized tools and assumes the facts in the opening:

  • the stuff to remove is on some completely black part of the image
  • it's kind of isolated; distance to medical-data is high

Steps

  • Run some general OCR to detect characters
  • Get the occupied pixels / borders somehow (i'm not sure what OCR tools return)
  • Calculate some outer rectangle and black-out (using some predefined widening-gap; this one needs to be much bigger than the one above)

Alternative 2

Sketch only: The idea is to use something like binary-closing on the image somehow to build fully connected-components ouf of the source pixels (while small gaps/holes are filled), so that we got one big component describing the medical-data and one for the watermark. Then just remove the smaller one.

Upvotes: 0

Related Questions