Reputation: 3640
I am analyzing medical images. All images have a marker with the position. It looks like this
It is the "TRH RMLO" annotation in this image, but it can be different in other images. Also the size varies. The image is cropped but you see that the tissue is starting on the right side. I found that the presence of these markers distort my analysis.
How can I remove them?
I load the image in python like this
import dicom
import numpy as np
img = dicom.read_file(my_image.dcm)
img_array = img.pixel_array
The image is then a numpy array. The white text is always surrounded by a large black area (black has value zero). The marker is in a different position in each image.
How can I remove the white text without hurting the tissue data.
UPDATE
added a second image
UPDATE2: Here are two of the original dicom files. All personal information has been removed.edit:removed
Upvotes: 5
Views: 4298
Reputation: 1
Simpler one is still possible!!!.
Just implement following after (img_array = img.pixel_array)
img_array[img_array > X] = Y
In which X is the intensity threshold you want to eliminate after that. Also Y is the intensity value which you want to consider instead of that.
For example: img_array[img_array > 4000] = 0
Replace white matter greater than 4000 with black intensity 0.
Upvotes: -1
Reputation: 21203
I have another idea. This solution is in OpenCV using python. It is a rather solution.
First, obtain the binary threshold of the image.
Perform morphological dilation:
dilate = cv2.morphologyEx(th, cv2.MORPH_DILATE, kernel, 3)
median = cv2.medianBlur(dilate, 9)
Now you can use the contour properties to eliminate the smallest contour and retain the other containing the image.
It also works for the second image:
Upvotes: 2
Reputation: 207375
Looking at the actual pixel values of the image you supplied, you can see that the marker is almost (99.99%) pure white and this doesn't occur elsewhere in the image so you can isolate it with a simple 99.99% threshold.
I prefer ImageMagick at the command-line, so I would do this:
convert sample.dcm -threshold 99.99% -negate mask.png
convert sample.dcm mask.png -compose darken -composite result.jpg
Of course, if the sample image is not representative, you may have to work harder. Let's look at that...
If the simple threshold doesn't work for your images, I would look at "Hit and Miss Morphology". Basically, you threshold your image to pure black and white - at around 90% say, and then you look for specific shapes, such as the corner markers on the label. So, if we want to look for the top-left corner of a white rectangle on a black background, and we use 0
to mean "this pixel must be black", 1
to mean "this pixel must be white" and -
to mean "we don't care", we would use this pattern:
0 0 0 0 0
0 1 1 1 1
0 1 - - -
0 1 - - -
0 1 - - -
Hopefully you can see the top left corner of a white rectangle there. That would be like this in the Terminal:
convert sample.dcm -threshold 90% \
-morphology HMT '5x5:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' result.png
Now we also want to look for top-right, bottom-left and bottom-right corners, so we need to rotate the pattern, which ImageMagick handily does when you add the >
flag:
convert sample.dcm -threshold 90% \
-morphology HMT '5x5>:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' result.png
Hopefully you can see dots demarcating the corners of the logo now, so we could ask ImageMagick to trim the image of all extraneous black and just leave the white dots and then tell us the bounding box:
cconvert sample.dcm -threshold 90% \
-morphology HMT '5x5>:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' -format %@ info:
308x198+1822+427
So, if I now draw a red box around those coordinates, you can see where the label has been detected - of course in practice I would draw a black box to cover it but I am explaining the idea:
convert sample.dcm -fill "rgba(255,0,0,0.5)" -draw "rectangle 1822,427 2130,625" result.png
If you want a script to do that automagically, I would use something like this, saving it as HideMarker
:
#!/bin/bash
input="$1"
output="$2"
# Find corners of overlaid marker using Hit and Miss Morphology, then get crop box
IFS="x+" read w h x1 y1 < <(convert "$input" -threshold 90% -morphology HMT '5x5>:0,0,0,0,0 0,1,1,1,1 0,1,-,-,- 0,1,-,-,- 0,1,-,-,-' -format %@ info:)
# Calculate bottom-right corner from top-left and dimensions
((x1=x1-1))
((y1=y1-1))
((x2=x1+w+1))
((y2=y1+h+1))
convert "$input" -fill black -draw "rectangle $x1,$y1 $x2,$y2" "$output"
Then you would do this to make it executable:
chmod +x HideMarker
And run it like this:
./HideMarker someImage.dcm result.png
Upvotes: 5
Reputation: 123
I am sure this can be optimized, but ... You could create 4 patches of size 3x3 or 4x4, and initialize them with the exact content of the pixel values for each of the individual corners of the frame surrounding the annotation text. You could then iterate over the whole image (or have some smart initialization looking only in the black area) and find the exact match for those patches. It is not very likely you will have the same regular structure (90 deg corner surrounded by near 0) in the tissue, so this might give you the bounding box.
Upvotes: 0
Reputation: 1582
If these annotations are in the DICOM file there are a couple ways they could be stored (see https://stackoverflow.com/a/4857782/1901261). The currently supported method can be cleaned off by simply removing the 60xx group attributes from the files.
For the deprecated method (which is still commonly used) you can clear out the unused high bit annotations manually without messing up the other image data as well. Something like:
int position = object.getInt( Tag.OverlayBitPosition, 0 );
if( position == 0 ) return;
int bit = 1 << position;
int[] pixels = object.getInts( Tag.PixelData );
int count = 0;
for( int pix : pixels )
{
int overlay = pix & bit;
pixels[ count++ ] = pix - overlay;
}
object.putInts( Tag.PixelData, VR.OW, pixels );
If these are truly burned into the image data, you're probably stuck using one of the other recommendations here.
Upvotes: 1
Reputation: 33522
The good thing is, that these watermarks are probably in an isolated totally black are which makes it easier (although it's questionable if removing this is according to the indicated usage; license-stuff).
Without beeing an expert, here is one idea. It might be a sketch of some very very powerful approach tailored to this problem but you have to decide if implementation-complexity & algorithmic-complexity (very dependent on image-statistics) are worth it:
0
Binarize
1
2
3
Do some connected-component calculation to get some objects which are vertical and horizontal lines
Now you can try different chosings of candidate-components (8 real ones) with the following knowledge
4 - Calculate the rectangle from these borders - Widen it by a few pixels (hyper-parameter) - Black-out that rectangle
That's the basic approach.
This one is much less work, use more specialized tools and assumes the facts in the opening:
Steps
Sketch only: The idea is to use something like binary-closing on the image somehow to build fully connected-components ouf of the source pixels (while small gaps/holes are filled), so that we got one big component describing the medical-data and one for the watermark. Then just remove the smaller one.
Upvotes: 0