CutePoison
CutePoison

Reputation: 5395

Fine tune/train a pre-trained BERT on data with no sentences but only words (bank transactions)

I have a lot of bank-transactions which I want to classify into different categories. The issue is that the text is not a sentence as such but consists only of words e.g "private withdrawal", "payment invoice 19234", "taxes" etc.

Since the domain is so specific, I think we might get a better performance by fine-tune a already pre-trained BERT, compared to just use the pre-trained BERT right away, but how do we do that when we don't have any sentences? I.e how would the "guess next sentence" part be created? Or can we skip it?

Upvotes: 0

Views: 759

Answers (1)

Alexandre Catalano
Alexandre Catalano

Reputation: 772

Your problem is a sequence classification problem. If you want to use a pre-trained model, you want to do transfer learning. Basically you want to use the Bert base model and add a layer of classification.

You can check huggingface for that https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForSequenceClassification

My answer isn't very specific, feel free to add details to your question

Upvotes: 1

Related Questions