Reputation: 13
This maybe the most beginner question of all :sweat:.
I just started learning about NLP and hugging face. The first thing I'm trying to do is to apply one the bioBERT models on some clinical note data and see what I do, before moving on to the fine-tuning the model. And it looks like "emilyalsentzer/Bio_ClinicalBERT" to be the closest model for my data.
But as I try to use it for any of the analyses I always get this warning.
Some weights of the model checkpoint at emilyalsentzer/Bio_ClinicalBERT were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight']
From the hugging face course chapter 2 I understand this meant.
This is because BERT has not been pretrained on classifying pairs of sentences, so the head of the pretrained model has been discarded and a new head suitable for sequence classification has been added instead. The warnings indicate that some weights were not used (the ones corresponding to the dropped pretraining head) and that some others were randomly initialized (the ones for the new head). It concludes by encouraging you to train the model, which is exactly what we are going to do now.
So I went on to test which NLP task I can use "emilyalsentzer/Bio_ClinicalBERT" for, out of the box.
from transformers import pipeline, AutoModel
checkpoint = "emilyalsentzer/Bio_ClinicalBERT"
nlp_task = ['conversational', 'feature-extraction', 'fill-mask', 'ner',
'question-answering', 'sentiment-analysis', 'text-classification',
'token-classification',
'zero-shot-classification' ]
for task in nlp_task:
print(task)
process = pipeline(task=task, model = checkpoint)
And I got the same warning message for all the NLP tasks, so it appears to me that I shouldn't/advised not to use the model for any of the tasks. This really confuses me. The original bio_clinicalBERT model paper stated that they had good results on a few different tasks. So certainly the model was trained for those tasks. I also have similar issue with other models as well, i.e. the blog or research papers said a model obtained good results with a specific task but when I tried to apply with pipeline it gives the warning message. Is there any reason why the head layers were not included in the model?
I only have a few hundreds clinical notes (also unannotated :frowning_face:), so it doesn't look like it's big enough for training. Is there any way I could use the model on my data without training?
Thank you for your time.
Upvotes: 1
Views: 1783
Reputation: 327
This Bio_ClinicalBERT model is trained for Masked Language Model (MLM) task. This task basically used for learning the semantic relation of the token in the language/domain. For downstream tasks, you can fine-tune the model's header with your small dataset, or you can use a fine-tuned model like Bio_ClinicalBERT-finetuned-medicalcondition which is the fine-tuned version of the same model. You can find all the fine-tuned models in HuggingFace by searching 'bio-clinicalBERT' as in the link.
Upvotes: 1