How to specify document language while importing a dataset in Google Cloud AutoML?

Question

I am trying to train a model for text classification in VertexAI AutoML (Google Cloud) using documents in Spanish. I imported the documents as JSON lines and tried specifying the language of each document as follows:

{"textContent":"Esto está escrito en español","languageCode":"es-ES","classificationAnnotations":[{"displayName":"Class A"},{"displayName":"Class B"}]}

According to the schema file in the Vertex AI documentation on how to prepare the training data, the line above should work. However I could not find a way to check whether the language was imported correctly, and if I export the dataset back the languageCode field has an empty string as value.

What is the correct way to specify language of a document while importing it into a dataset? Is there any way to check that the language was imported correctly?

How to specify document language while importing a dataset in Google Cloud AutoML?

Answers (0)

Related Questions