Reputation: 61
my data is like:
sample1 = [[1, 0, 3, 5, 0, 9], 0, 1.5, 0]
sample2 = [[0, 4, 0, 6, 2, 0], 2, 1.9, 1]
sample3 = [[9, 7, 6, 0, 0, 0], 0, 1.3, 1]
paul = pd.DataFrame(data = [sample1, sample2, sample3], columns=`['list','cat','metr','target'])`
on this data a scikit-learn kNN-Regression with an specific distance function should be done.
The distance function is:
def my_distance(X,Y,**kwargs):
if len(X)>1:
x = X
y = Y
all_minima = []
for k in range(len(x)):
one_minimum = min(x[k],y[k])
all_minima.append(one_minimum)
sum_all_minima=sum(all_minima)
distance = (sum(x)+sum(y)-sum_all_minima) * kwargs["Para_list"]
elif X.dtype=='int64':
x = X
y = Y
if x == y and x != -1:
distance = 0
elif x == -1 or y == -1 or x is None or y is None:
distance = kwargs["Para_minus1"] * 1
else:
distance = kwargs["Para_nominal"] * 1
else:
x = X
y = Y
if x == y:
distance = 0
elif x == -1 or y == -1 or x is None or y is None:
distance = kwargs["Para_minus1"] * 1
else:
distance = abs(x-y) * kwargs["Para_metrisch"]
return distance
And should be implemented as valid distance function by
DistanceMetric.get_metric('pyfunc',func=my_distance)
As I'm right, the scikit code should be like this:
train , test = train_test_split(paul, test_size = 0.3)
#x_train soll nur unabhähgige Variablen enthalten, andere kommen raus:
x_train = train.drop('target', axis=1)
y_train = train['target']
x_test = test.drop('target', axis = 1)
y_test = test['target']
knn = KNeighborsRegressor(n_neighbors=2,
algorithm='ball_tree',
metric=my_distance,
metric_params={"Para_list": 2,
"Para_minus1": 3,
"Para_metrisch": 2,
"Para_nominal": 4}))
knn.fit(x_train,y_train)
y_pred=knn.predict(x_test)
I get
ValueError: setting an array element with a sequence.
I guess scikit can not handle a single feature item as list? Is there a way to make that happen?
Upvotes: 1
Views: 399
Reputation: 1672
I guess scikit can not handle a single feature item as list? Is there a way to make that happen?
No, there is no way I know of to make this happen. You need to convert this feature into 2D matrix, concatenate it with other 1D features, to form data appropriately. This is standard sklearn
behavior.
Unless you have some very narrow use-case, making 2D array from list feature is totally fine. I assume, all lists have same length.
Upvotes: 1