Moonreaderx
Moonreaderx

Reputation: 31

Fine-tune BERT for a specific domain on a different language?

I want to fine-tune on a pre-trained BERT model. However, my task uses data within a specific domain (say biomedical data). Additionally, my data is also in a language different from English (say Dutch).

Now I could fine-tune the Dutch bert-base-dutch-cased pre-trained model. However, how would I go about fine-tuning a Biomedical BERT model, like BioBERT, which is in the correct domain, but wrong language?

I have thought about using NMT, but don't think it's viable and worth the effort. If I fine-tune without any alterations to the model, I fear that the model will not learn the task well since it was pre-trained on a completely different language.

Upvotes: 1

Views: 2262

Answers (3)

Phanendra Sai Ram
Phanendra Sai Ram

Reputation: 1

I dont Think it works, even if it does there is a high chance that the BERT will tokenize many dutch words as unknown. So, I'd suggest you to try to finetune with this multilingual BERT model https://huggingface.co/google-bert/bert-base-multilingual-cased

Upvotes: 0

Ghada Mansour
Ghada Mansour

Reputation: 11

Never tried this before, but I believe you can apply task adaptive pretraining (TAPT) on a Dutch BERT model, which means you can pretrain a Dutch BERT model on small biomedical data provided in Dutch to make it augment the general knowledge it has about the Dutch with the specific knowledge of your task (the biomedical task you are interested in).

Upvotes: 0

samuelstevens
samuelstevens

Reputation: 31

I just want to know if there are any methods that allow for fine-tuning a pre-trained BERT model trained on a specific domain and use it for data within that same domain, but a different language

Probably not. BERT's vocabulary is fixed at the start of pre-training, and adding additional vocabulary leads to random weight initializations.

Instead, I would:

  1. Look for a multi-lingual, domain-specific version of BERT as @Ashwin said.
  2. Fine-tune Dutch BERT on your task and see if performance is acceptable. In general, BERT can adapt to different tasks quite well.
  3. (If you have the available resources) Continue pre-training Dutch BERT on your specific domain (for example, like SciBERT) and then fine-tune on your task.

Upvotes: 1

Related Questions