Reputation: 103
When trying to upload zip files, it gives an error:
INFO: <-- HTTP FAILED: java.net.SocketException: Connection reset by peer: socket write error (Error Screenshot)
Here what I found from IBM Watson Documentation
Hence I have already taken care of each and every aspect stated above..
I have paid for the service, and changes the api key.
Total zips: Around 1000.
Each zip contains around 15 images.
what I think the issue might be is, if I check the total size of all my zip files, then it is around 1GB. So is that an issue of having huge amount of zip files? The same code is working fine for less amount of zip files.
List<File> allZipPath = new ArrayList<File>();
// add zip paths
Builder classBuilder = new ClassifierOptions.Builder();
for(int i=0; i<allZipPath.size(); i++){
//Take the name and add as a name to the class
classBuilder.addClass(allZipPath.get(i).getName(), allZipPath.get(i));
}
ClassifierOptions createCanaryOptions = classBuilder.classifierName(classifierName).build();
// you can add negative zip by using ".negativeExamples(new File(myFilePath +"cats.zip")).build()"
result = service.createClassifier(createCanaryOptions).execute();
//System.out.println(result);
System.out.println("Classifier created with Id: " + result.getId() + "\n\n");
Upvotes: 1
Views: 777
Reputation: 1106
and thanks for your interest in Visual Recognition.
The documentation is written with the assumption that you are submitting 1 zip file per class
within the classifier
that you are training.
Are you splitting examples from the same class into different .zip files? That is possible, but not necessary unless your examples for a single class exceed 100MB.
The recommended pattern for training is to make a single request which totals under 256 MB which contains all the examples for each class inside it. If you have more training data than that, you can submit additional "retraining" requests which add more classes, and / or more examples to existing classes. Retraining is documented here: https://www.ibm.com/watson/developercloud/doc/visual-recognition/tutorial-custom-classifier.html#to-add-new-classes-to-an-existing-classifier
The service requires a minimum of 10 images per .zip file.
Minimum recommend size of an image is 32X32 pixels.
To clarify, these are minimums - there must be at least 10 example images for each class
within the classifier
you are training. It is best to put all the training images you can for a class
into a single .zip file, subject to the limit of 100MB per .zip file. If you have more examples than that per class, you can use the retraining function to add more.
Also, 32x32 is the minimum size. Ideally you should submit original size images, but if you need to shrink them to save time or bandwidth, your can resize to 224x224 for now without loss of training quality. (Exact sizes subject to change in the future)
I have paid for the service, and changes the api key.
Total zips: Around 1000.
Does this mean your POST /classifiers request contains around 1000 form fields? That could be the source of the problem at some point in the connection between client code and server.
Each zip contains around 15 images.
While the system does have a minimum of 10 images per class, providing more examples (like 100-200) generally leads to much better results.
what I think the issue might be is, if I check the total size of all my zip files, then it is around 1GB. So is that an issue of having huge amount of zip files? The same code is working fine for less amount of zip files.
As you noted, if your total request size is 1GB, this will be over the 256 MB limit:
The service accepts a maximum of 256 MB per training call.
and that could cause the error you observe.
My advice would be to train a smaller number of classes than 1000 to start, with as many examples per class as you can, to evaluate your results before going to 1000 classes. If you have already done that, the best strategy (since you said you have about 1GB of data in total) would be to split it into 1 original training request (under 256MB total size) and 3-4 additional requests, each under 256MB. The bill for retraining is equal to the number of images submitted in the request (as it is for original training) so the cost is that same as if a single large request were successful. You can expect each training request to take 1-2 seconds per image.
Additional info on training guidelines is here: https://www.ibm.com/watson/developercloud/doc/visual-recognition/customizing.html#guidelines-for-good-training
Upvotes: 0