krrish
krrish

Reputation: 11

How to create a training pipeline for huggingface bert base uncased clinical NER

Current BERT base uncased clinical NER predict clinical entities( Problem, Test, Treatment)

I want to train on different clinical dataset to get entity like ( Disease, Medicine, Problem)

How to achieve that??

Upvotes: 1

Views: 491

Answers (1)

Tanmoy
Tanmoy

Reputation: 396

Model

There are several models in Huggingface which are trained on medical specific articles, those will definitely perform better than normal bert-base-uncased. BioELECTRA is one of them and it managed to outperform existing biomedical NLP models in several benchmark tests.

There are 3 different versions of those models depending on their pretraining dataset. But I think these 2 will be the best to start with.

Bioelectra-base-discriminator-pubmed: Pretrained on pubmed

Bioelectra-base-discriminator-pubmed-pmc: Pretrained on pubmed and pmc

NER Datasets:

Now coming to NER dataset there are several dataset you might like or you might want to create a composite dataset. Some of these are - BC5-disease, NCBI-disease, BC5CDR-disease from BLUE benchmark

[Let me know if you need any help with model creation or setting up the finetuning setup. Also please use proper metrics to evaluate them and do share the metrics dashboard after it gets finished.]

Upvotes: 3

Related Questions