paule
paule

Reputation: 61

kNN feature should passed through as list

my data is like:

sample1 = [[1, 0, 3, 5, 0, 9], 0, 1.5, 0]
sample2 = [[0, 4, 0, 6, 2, 0], 2, 1.9, 1]
sample3 = [[9, 7, 6, 0, 0, 0], 0, 1.3, 1] 
paul = pd.DataFrame(data = [sample1, sample2, sample3], columns=`['list','cat','metr','target'])`

on this data a scikit-learn kNN-Regression with an specific distance function should be done.

The distance function is:

def my_distance(X,Y,**kwargs):
    if len(X)>1:
        x = X
        y = Y
        all_minima = []
        for k in range(len(x)):
            one_minimum = min(x[k],y[k])
            all_minima.append(one_minimum)
            
        sum_all_minima=sum(all_minima)
        distance = (sum(x)+sum(y)-sum_all_minima) * kwargs["Para_list"]
      
    elif  X.dtype=='int64':
        x = X
        y = Y
        if x == y and x != -1:
            distance = 0
        elif x == -1 or y == -1 or x is None or y is None:
            distance = kwargs["Para_minus1"] * 1
        else:
            distance = kwargs["Para_nominal"] * 1
    else:
        x = X
        y = Y
        if x == y:
            distance = 0
        elif x == -1 or y == -1 or x is None or y is None:
            distance = kwargs["Para_minus1"] * 1
        else:
            distance = abs(x-y) * kwargs["Para_metrisch"]
    return distance

And should be implemented as valid distance function by

DistanceMetric.get_metric('pyfunc',func=my_distance)

As I'm right, the scikit code should be like this:

train , test = train_test_split(paul, test_size = 0.3)

#x_train soll nur unabhähgige Variablen enthalten, andere kommen raus:
x_train = train.drop('target', axis=1)
y_train = train['target']

x_test = test.drop('target', axis = 1)
y_test = test['target']

knn = KNeighborsRegressor(n_neighbors=2,
                          algorithm='ball_tree',
                          metric=my_distance,
                          metric_params={"Para_list": 2,
                                         "Para_minus1": 3,
                                         "Para_metrisch": 2,
                                         "Para_nominal": 4}))
knn.fit(x_train,y_train)
y_pred=knn.predict(x_test)

I get

ValueError: setting an array element with a sequence.

I guess scikit can not handle a single feature item as list? Is there a way to make that happen?

Upvotes: 1

Views: 399

Answers (1)

ptyshevs
ptyshevs

Reputation: 1672

I guess scikit can not handle a single feature item as list? Is there a way to make that happen?

No, there is no way I know of to make this happen. You need to convert this feature into 2D matrix, concatenate it with other 1D features, to form data appropriately. This is standard sklearn behavior.

Unless you have some very narrow use-case, making 2D array from list feature is totally fine. I assume, all lists have same length.

Upvotes: 1

Related Questions