Reputation: 1038
I have a python function that takes a list of smaller images boxes
(represented as arrays) and the whole image img
in as a parameter and finds outliers. The outliers will either be significantly brighter or darker than the other images in the list, but darker is the more common case.
def find_outliers(boxes, img):
means = [np.mean(box['src']) for box in boxes]
asc = sorted(means)
q1, q3 = np.percentile(asc, [25,75])
iqr = q3 - q1
lower = q1 - (1.5 * iqr)
upper = q3 + (1.5 * iqr)
# print('thresholds:', lower, upper)
return list(filter(lambda x: np.mean(x['src']) < lower or np.mean(x['src']) > upper, boxes))
This method allows me to create thresholds based on the image, instead of coming up with hard values, which is ideal in my situation. There are 3 problems I need to address if I continue this approach.
boxes
is very small (3 or 4). This makes it hard for this method to find an adequate lower and upper bound.Is there a statistical approach that is better suited for this type of problem? Is there a different way to establish a threshold values based on the image?
Note: I also have tried the standard deviation outlier approach but this one isn't suitable in this scenario.
Upvotes: 1
Views: 404
Reputation: 30679
Rather than finding outliers in the list of boxes, we calculate the lower and upper boundaries with respect to the whole image and any boxes with average gray values outside these boundaries are considered as outliers:
def find_outliers(boxes, img):
q1, q3 = np.percentile(img, [25,75])
iqr = q3 - q1
lower = q1 - (1.5 * iqr)
upper = q3 + (1.5 * iqr)
# print('thresholds:', lower, upper)
return list(filter(lambda x: np.mean(x['src']) < lower or np.mean(x['src']) > upper, boxes))
Upvotes: 1