Reputation: 2161
I want to do chinese Textual Similarity with huggingface:
tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
model = TFBertForSequenceClassification.from_pretrained('bert-base-chinese')
It doesn't work, system report errors:
Some weights of the model checkpoint at bert-base-chinese were not used when initializing TFBertForSequenceClassification: ['nsp___cls', 'mlm___cls']
- This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-chinese and are newly initialized: ['classifier', 'dropout_37']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
But I can use huggingface to do name entity:
tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
model = TFBertForTokenClassification.from_pretrained("bert-base-chinese")
Does that mean huggingface haven't done chinese sequenceclassification? If my judge is right, how to sove this problem with colab with only 12G memory?
Upvotes: 2
Views: 3769
Reputation: 91
The reason is simple. The model has not been fine-tuned for the Sequence classification task hence when you try to load the 'bert-base-chinese' model over a Sequence classification model. It updates the rest of the layers ['nsp___cls', 'mlm___cls'] randomly. And it's a warning which means the model will be giving random results due to the random last layer initialization.
BTW @andy you didn't upload the output for token classification? It should also show a similar warning but with ['classifier'] layer as randomly initiated.
Do use a fine-tuned model, else you would need to fine-tune this loaded model.
Upvotes: 1