Reputation: 214
I am currently trying to implement NER model using sklearn_crfsuite
library.
The training code is simply as follows:
for repeat in range(10):
crf = sklearn_crfsuite.CRF(
algorithm='lbfgs',
c1=0.1,
c2=0.1,
max_iterations=100,
all_possible_transitions=True,
verbose=True
)
crf.fit(X_train, y_train)
pred_list = crf.predict(X_test)
The code is do training for ten repeat, my goal is to observe 10 different scores and average them as a final score. However, each repeat gives the same score, although I reinitialize the model in each loop.
The question is, how I properly set random seed so that each repeat can give different results?
NOTE: After I shuffle the training data in each loop, it still gives the same results. Finally, I changed the training algorithm from
'lbfgs'
(Gradient descent using the L-BFGS method) to'l2sgd'
(Stochastic Gradient Descent with L2 regularization), then I started to obtain different results.
Upvotes: 0
Views: 318
Reputation: 6270
You don't search for a random seed, you probably search for cross validation:
the full documentation you can find here.
if you want to run 10 different iterations you can use:
crf = sklearn_crfsuite.CRF(
algorithm='lbfgs',
max_iterations=100,
all_possible_transitions=True,
verbose=True
)
params_space = {
'c1': scipy.stats.expon(scale=0.5),
'c2': scipy.stats.expon(scale=0.05),
}
# use the same metric for evaluation
f1_scorer = make_scorer(metrics.flat_f1_score,
average='weighted', labels=labels)
# search
rs = RandomizedSearchCV(crf, params_space,
cv=10,
verbose=1,
n_jobs=-1,
n_iter=50,
scoring=f1_scorer)
rs.fit(X_train, y_train)
and you will get the best parameters
Upvotes: 2