Reputation: 1
I would like to train a vision transformer from scratch. I'm not sure this is feasible using the original ViT, given the heavy training process it went through. So I wanted to ask what is the most recommended way to achieve performance close to the state of the art, but in a relatively efficient training process?
I saw Facebook released DeiT for this purpose, but this was a few years ago and its performance isn't really close to SoTA - is there anything more recent or better? References and git links (preferably with pyTorch) would be greatly appreciated :)
Thank you!!
Upvotes: 0
Views: 47