Reputation: 40649
We are trying to understand the underlying model of Rasa - the forums there still didnt get us an answer - on two main questions:
we understand that Rasa model is a transformer-based architecture. Was it pre-trained on any data set? (eg wikipedia, etc)
then, if we understand correctly, the intent classification is a fine tuning task on top of that transformer. How come it works with such small training sets?
appreciate any insights!
thanks
Lior
Upvotes: 0
Views: 286
Reputation: 448
the transformer model is not pre-trained on any dataset. We use quite a shallow stack of transformer which is not as data hungry as deeper stacks of transformers used in large pre-trained language models. Having said that, there isn't an exact number of data points that will be sufficient for training your assistant as it varies by the domain and your problem. Usually a good estimate is 30-40 examples per intent.
Upvotes: 1