Should Test and Dev Set have different distribution from Training Set an Dev Set

Suppose I am building a network to localize an object. My training data consists of images captured at 5 different locations, and its a small data set ( each location has around 2k images ). Should I add all the images, shuffle them and then distribute them into training(60%), dev(20%), test(20%) or should I take data from 3 locations as Training, 1 location as Test and 1 location as dev.

Upvotes: 0

Answers (1)

mohit bhatia

Reputation: 101

Ideally the training set, test set and validation set should be drawn from the same distribution, so going by that you should add all the images, shuffle them and then distribute them into training(60%), dev(20%), test(20%). Also this would help you net be more invariant to location(as it would learn to ignore 5 locations against 3) and would have a better chance at generalization with the added diversity.

Upvotes: 1

Should Test and Dev Set have different distribution from Training Set an Dev Set

Answers (1)

Related Questions