How to Train and Test Custom Classifier Processor Of Document AI using Python

Question

I want to train and test custom document Classifier using Python Code and I found this train Processor. And I started implementing using this Documentation. But I am getting one error when I call function

train_processor_version_sample(497857003374, 'us','a530739de44a7ca6',"Version-1","gs://documentai-bucket-123/pdfs","gs://documentai-bucket-123/test")

error:

InvalidArgument                           Traceback (most recent call last)
Input In [22], in ()
----> 1 train_processor_version_sample(497857003374, 'us','a530739de44a7ca6',"Version-1","gs://documentai-bucket-123/pdfs","gs://documentai-bucket-123/test")

Input In [17], in train_processor_version_sample(project_id, location, processor_id, processor_version_display_name, train_data_uri, test_data_uri)
     52 print(operation.operation.name)
     53 # Wait for operation to complete
---> 54 response = documentai.TrainProcessorVersionResponse(operation.result())
     56 metadata = documentai.TrainProcessorVersionMetadata(operation.metadata)
     58 print(f"New Processor Version:{response.processor_version}")

File ~/anaconda3/lib/python3.9/site-packages/google/api_core/future/polling.py:261, in PollingFuture.result(self, timeout, retry, polling)
    256 self._blocking_poll(timeout=timeout, retry=retry, polling=polling)
    258 if self._exception is not None:
    259     # pylint: disable=raising-bad-type
    260     # Pylint doesn't recognize that this is valid in this case.
--> 261     raise self._exception
    263 return self._result

InvalidArgument: 400 Invalid dataset. See operation metadata for specific errors

I have some idea about this. It is because custom document classifier have some training dataset requiements

Training guidelines

Minimum 2 labels required in the schema

Each label exists on 10 training documents

Each label exists on 2 test documents

I don't know how to get labeled dataset url and pass two bucket directory for training and test set too using python code. Can Anyone help me on this?

How to Train and Test Custom Classifier Processor Of Document AI using Python

Answers (1)

Related Questions