Reputation: 13
I have the following data :
X = [[ 0.],[ 0.],[ 0.],[ 0.],[ 5.25799992],[10.51700001],[15.74699956],[21.03599973],[26.41500018]]
y = [181.42686706, 144.47493065, 143.93277864, 143.93277864, 166.07783771, 127.06519488, 80.16842458, 58.30687141, 48.83896311]
def no_similar_times(X: np.array, y: np.array) -> bool:
#returns True if no duplicate in X else False
print(X)
print(len(np.unique(X.round(1))) == len(X))
print("")
return len(np.unique(X.round(1))) == len(X)
def get_inliers() -> np.array:
# predictor is 2d polynomial
ransac = RANSACRegressor(
estimator=make_pipeline(PolynomialFeatures(3), LinearRegression()),
min_samples=0.4,
is_data_valid=no_similar_times,
)
ransac.fit(X, y)
inlier_mask = ransac.inlier_mask_
print("Inliers")
no_similar_times(X[inlier_mask], y[inlier_mask])
return ransac, inlier_mask
if __name__ == "__main__":
get_inliers()
When running this code, I obtain an inlier_mask that corresponds to invalid data (meaning that no_similar_times(X[inlier_mask], y[inlier_mask])
returns False. It should not be the case since a set of inliers should necessarily be valid in the RANSAC routine not to be skipped.
When printing I obtain :
[[ 0. ]
[10.51700001]
[ 0. ]
[ 0. ]]
False
[[21.03599973]
[15.74699956]
[ 0. ]
[ 5.25799992]]
True
[[ 0. ]
[21.03599973]
[10.51700001]
[26.41500018]]
True
[[26.41500018]
[ 0. ]
[10.51700001]
[ 5.25799992]]
True
[[ 0. ]
[ 0. ]
[ 0. ]
[10.51700001]]
False
Inliers
[[ 0. ]
[ 0. ]
[ 0. ]
[ 5.25799992]
[10.51700001]
[15.74699956]
[21.03599973]
[26.41500018]]
False
Meaning that no_similar_times
is working as expected but that the output inlier mask is not one of the valid subset that was generated during the fitting process.
Can someone explain what happens?
Upvotes: 0
Views: 33