Reputation: 6796
Is there any way to return probabilities and actual class using Trainer.predict
?
I checked the documentation at this page but couldn't figure out. As of now it seems to be returning logits
Obviously both probabilities and actual class could be computed using additional coding but wondering if there is any prebuilt method to do the same
my current output as below
new_predictions=trainer.predict(dataset_for_future_predicition_after_tokenizer)
new_predictions
PredictionOutput(predictions=array([[-0.43005577, 3.646306 , -0.8073783 , -1.0651836 , -1.3480505 ,
-1.108013 ],
[ 3.5415223 , -0.8513837 , -1.8553216 , -0.18011567, -0.35627165,
-1.8364134 ],
[-1.0167522 , -0.8911268 , -1.7115675 , 0.01204597, 1.7177908 ,
1.0401527 ],
[-0.82407415, -0.46043932, -1.089274 , 2.6252217 , 0.33935028,
-1.3623345 ]], dtype=float32), label_ids=None, metrics={'test_runtime': 0.0182, 'test_samples_per_second': 219.931, 'test_steps_per_second': 54.983})
Upvotes: 1
Views: 7848
Reputation: 441
As you mentioned, Trainer.predict
returns the output of the model prediction, which are the logits.
If you want to get the different labels and scores for each class, I recommend you to use the corresponding pipeline
for your model depending on the task (TextClassification, TokenClassification, etc). This pipeline
has a return_all_scores
parameter on its __call__
method that allows you to get all scores for each label on a prediction.
Here's an example:
from transformers import TextClassificationPipeline, AutoTokenizer, AutoModelForSequenceClassification
MODEL_NAME = "..."
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)
pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer)
prediction = pipe("The text to predict", return_all_scores=True)
This is an example of how this prediction
variable will look like:
[{label: 'LABEL1', score: 0.80}, {label: 'LABEL2', score: 0.15}, {label: 'LABEL3', score: 0.05}]
The label names can be set on the model's config.json
file or when creating the model (before training it) by defining id2label
and label2id
model parameters:
model = AutoModelForSequenceClassification.from_pretrained(
MODEL_NAME,
num_labels=num_labels,
label2id={"Greeting": 0, "Help": 1, "Farewell": 2},
id2label={0: "Greeting", 1: "Help", 2: "Farewell"},
)
Upvotes: 5