Reputation: 43
I have some data and the dimension is 249X250. I have used the following code to plot the data:
import numpy as np
import pandas as pd
import matplotlib.pyplot as pl
data = pd.read_excel("sample_data.xlsx")
x = np.arange(data.shape[0])
y = np.arange(data.shape[1])
mask_data = np.ma.masked_outside(data,0,233)
pl.contourf(y,x,mask_data)
pl.colorbar()
and the plot came like this:
Now I want to remove the smaller patches on the right side of the plot and want to keep only the biggest patches. For this purpose my logic is to remove those connected pixels where the number of connected pixels are less than some specified threshold (for this purpose lets it be 200). How can I do this?
Upvotes: 2
Views: 1765
Reputation: 334
Essentially what you are looking to do is identify all objects in your image. This can be done with ndimage.measurements.label
from scipy.
Essentially it searches through an image for continuous groups of pixels and assigns them a label. You can then loop through those labeled sectors and count the size (in pixels) of the object and filter on that basis.
Even though you are pulling data in from Excel--what you effectively have is a 249x250 pixel "image" that you are plotting. Each cell in Excel is effectively a "pixel" containing a value. To drive this point home you could quite literally use the image-showing functions in matplotlib (e.g. plt.imshow
)
import matplotlib.pyplot as plt
import numpy as np
from scipy import ndimage
xn = 250
yn = 249
# fake data to illustrate that images are just matrices of values
X = np.stack([np.arange(xn)] * yn)
Y = np.stack([np.arange(yn)] * xn).transpose()
Z = np.sin(3*np.pi * X/xn) * np.cos(4*np.pi * Y/yn) * np.sin(np.pi * X/xn)
Z[Z <.5] = 0
fig,axes = plt.subplots(1,2)
axes[0].contourf(Z)
axes[0].set_title("Before Removing Features")
# now identify the objects and remove those above a threshold
Zlabeled,Nlabels = ndimage.measurements.label(Z)
label_size = [(Zlabeled == label).sum() for label in range(Nlabels + 1)]
for label,size in enumerate(label_size): print("label %s is %s pixels in size" % (label,size))
# now remove the labels
for label,size in enumerate(label_size):
if size < 1800:
Z[Zlabeled == label] = 0
axes[1].contourf(Z)
axes[1].set_title("After Removing Features")
Upvotes: 3