Chintak
Chintak

Reputation: 63

Auto labeling for Text Data with Amazon Sagemaker ground truth

What is the minimum number of text rows needed for ground truth to do auto-labelling ? I have text file which contains 1000 rows, is this good enough to get started with auto-labelling by sagemaker ground truth ?

Upvotes: 2

Views: 561

Answers (2)

vikmadan
vikmadan

Reputation: 116

I'm a product manager on the Amazon SageMaker Ground Truth team, and I'm happy to help you with this question. The minimum system requirement is 1,000 objects. In practice with text classification, we typically see meaningful results (% of data auto-labeled) only once you have 2,000 to 3,000 text objects. Remember performance is variable and depends on your dataset and the complexity of your task.

Upvotes: 2

Ujjwal Bhardwaj
Ujjwal Bhardwaj

Reputation: 755

From the documentation,

You should use automated data labeling only on large datasets. The neural networks used with active learning require a significant amount of data for every new dataset. With larger datasets there is more potential to automatically label the data and therefore reduce the total cost of labeling. We recommend that you use thousands of data objects when using automated data labeling. You must use at least 5,000 data objects

https://docs.aws.amazon.com/sagemaker/latest/dg/sms-automated-labeling.html

Upvotes: 0

Related Questions