Reputation: 5395
I have a lot of bank-transactions which I want to classify into different categories. The issue is that the text is not a sentence as such but consists only of words e.g "private withdrawal", "payment invoice 19234", "taxes" etc.
Since the domain is so specific, I think we might get a better performance by fine-tune a already pre-trained BERT, compared to just use the pre-trained BERT right away, but how do we do that when we don't have any sentences? I.e how would the "guess next sentence" part be created? Or can we skip it?
Upvotes: 0
Views: 759
Reputation: 772
Your problem is a sequence classification problem. If you want to use a pre-trained model, you want to do transfer learning. Basically you want to use the Bert base model and add a layer of classification.
You can check huggingface for that https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForSequenceClassification
My answer isn't very specific, feel free to add details to your question
Upvotes: 1