Reputation: 57
I would like to first split my data in a test and train set. Then I want to use GridSearchCV on my training set (internally split into train/validation set). In the end I want to collect all the testdata and do some other things (not in the scope of the question).
I have to scale my data. So I want to handle this problem in a pipeline. Some things in my SVC should be ficed (kernel='rbf', class_weight=...). When I run the code the following occurs:
"ValueError: Invalid parameter estimator for estimator Pipeline"
I don't understand what I'm doing wrong. I tried to follow this thread: StandardScaler with Pipelines and GridSearchCV
The only difference is, that I fix some parameters in my SVC. How can I handle this?
target = np.array(target).ravel()
loo = LeaveOneOut()
loo.get_n_splits(input)
# Outer Loop
for train_index, test_index in loo.split(input):
X_train, X_test = input[train_index], input[test_index]
y_train, y_test = target[train_index], target[test_index]
p_grid = {'estimator__C': np.logspace(-5, 2, 20),}
'estimator__gamma': np.logspace(-5, 3, 20)}
SVC_Kernel = SVC(kernel='rbf', class_weight='balanced',tol=10e-4, max_iter=200000, probability=False)
pipe_SVC = Pipeline([('scaler', RobustScaler()),('SVC', SVC_Kernel)])
n_splits = 5
scoring = "f1_micro"
inner_cv = StratifiedKFold(n_splits=n_splits,
shuffle=True, random_state=5)
clfSearch = GridSearchCV(estimator=pipe_SVC, param_grid=p_grid,
cv=inner_cv, scoring='f1_micro', iid=False, n_jobs=-1)
clfSearch.fit(X_train, y_train)
print("Best parameters set found on validation set for Support Vector Machine:")
print()
print(clfSearch.best_params_)
print()
print(clfSearch.best_score_)
print("Grid scores on validation set:")
print()
I also tried it this way:
p_grid = {'estimator__C': np.logspace(-5, 2, 20),
'estimator__gamma': np.logspace(-5, 3, 20),
'estimator__tol': [10e-4],
'estimator__kernel': ['rbf'],
'estimator__class_weight': ['balanced'],
'estimator__max_iter':[200000],
'estimator__probability': [False]}
SVC_Kernel = SVC()
This also doesn't work.
Upvotes: 1
Views: 264
Reputation: 5455
The problem is in your p_grid
. You are grid searching on your Pipeline
, and that doesn't have anything called estimator
. It does have something called SVC
, so if you want to set that SVC
's parameter, you should prefix you keys with SVC__
instead of estimator__
. So replace p_grid
with:
p_grid = {'SVC__C': np.logspace(-5, 2, 20),}
'SVC__gamma': np.logspace(-5, 3, 20)}
Also, you can replace your outer for
loop using cross_validate
function.
Upvotes: 1