Reputation: 535
I'm trying to run a simple GroundTruth labeling job with a private workforce for text classification. Since I'm new to AWS GroundTruth, I have some questions:
If I use private workforce what is the maximum number of persons that I can allocate to the labeling job? Does the pricing cost depend on number of persons used in private workforce.
I have a labeled dataset (text classication), and I upload it to S3 bucket, if I upload another unlabeled datas to it, will AutoML label the provided raw data? If not, how can I use already labelled dataset to label new raw datas/
Groundtruth documentation says that it needs atleast 1000 objects to be labeled by humans. Does it mean 1000 objects of all classes, or 1000 objects for individual class? If I manually label 1000+ objects, how many more objects will AutoML label or what is the maximum number of objects can AutoML label?
Upvotes: 0
Views: 1155
Reputation: 116
I'm the product manager for Amazon SageMaker Ground Truth, and I would be happy to answer your query. Here are my responses:
[1] Your private labeling workforce can be as large or small as you would like it to be. The pricing does not depend on this size of your labeling workforce.
[2] You learn more about how to bring a "partially" labeled dataset here: https://docs.aws.amazon.com/sagemaker/latest/dg/sms-reusing-data.html#sms-reusing-data-newdata
You can also use the ML model trained from a previous labeling job. Learn more here; https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-ground-truth-using-a-pre-trained-model-for-faster-data-labeling/
[3] To clarify, you need 1,000 dataset objects to start an auto-labeling job, but some of these 1,000 objects can be auto-labeled (the % depends on your data and use case). It is 1,000 objects across your classes - i.e. there is no additional requirement beyond having 1,000 text dataset objects.
You can learn more about the mechanics of auto-labeling from this blog post: https://aws.amazon.com/blogs/machine-learning/annotate-data-for-less-with-amazon-sagemaker-ground-truth-and-automated-data-labeling/
Upvotes: 3