Reputation: 587
Suppose I am building a network to localize an object. My training data consists of images captured at 5 different locations, and its a small data set ( each location has around 2k images ). Should I add all the images, shuffle them and then distribute them into training(60%), dev(20%), test(20%) or should I take data from 3 locations as Training, 1 location as Test and 1 location as dev.
Upvotes: 0
Views: 305
Reputation: 101
Ideally the training set, test set and validation set should be drawn from the same distribution, so going by that you should add all the images, shuffle them and then distribute them into training(60%), dev(20%), test(20%). Also this would help you net be more invariant to location(as it would learn to ignore 5 locations against 3) and would have a better chance at generalization with the added diversity.
Upvotes: 1