How to train on very small data set?

Question

We are trying to understand the underlying model of Rasa - the forums there still didnt get us an answer - on two main questions:

we understand that Rasa model is a transformer-based architecture. Was it pre-trained on any data set? (eg wikipedia, etc)
then, if we understand correctly, the intent classification is a fine tuning task on top of that transformer. How come it works with such small training sets?

appreciate any insights!

thanks

Lior

Answers (1)