Is it okay if we augment the data first then randomly choose the data and split the data afterward?

Question

I am doing a science project about classifying medical images but I do not have a lot of data so, is it okay if I augment the data first then randomly select the data to keep and split the kept data afterward? At first, my teacher told me to augment the data first then split the data into train, validation, and test. But I think my proposed method will make the training dataset collide with the testing dataset which will cause the accuracy to be unrealistic(way too high), so I thought my method that randomly chooses the files after doing data augmentation should help the augmented dataset to not be too similar to each other and solve the imbalanced amount of dataset problem.

Is it okay if we augment the data first then randomly choose the data and split the data afterward?

Answers (1)

Related Questions