Batch job for - textembedding-gecko-multilingual@latest on VertexAi from BigQuery table

Question

I am trying to create a batch job with textembedding-gecko-multilingual@latest with no success. After several failures of

Failed to import data. Not found: Dataset he5311ce8b310161b-tp:llm_bp_tenant_dataset_2558528304743186432 was not found in location US

I have created new dataset with single location - us-central1 the table has only 1 column - content And tried 2 different executions

1.

import subprocess
import requests

url =  'https://us-central1-aiplatform.googleapis.com/v1/projects/xxx-data/locations/us-central1/batchPredictionJobs'
TOKEN  = subprocess.getoutput('gcloud auth print-access-token')

headers = {"Authorization": f"Bearer {TOKEN}"}

req = {
    "name": "Gecko_embedding",
    "displayName": "Gecko_embedding",
    "model": "publishers/google/models/textembedding-gecko-multilingual",
    "inputConfig": {
      "instancesFormat":"bigquery",
      "bigquerySource":{
        "inputUri" : "bq://xxx-data.job_results.articles_4_eval",
      }
    },
    "outputConfig": {
      "predictionsFormat":"bigquery",
      "bigqueryDestination":{
        "outputUri": "bq://xxx-data.job_results.articles_4_eval_vectors"
    }
  }
}
response = requests.post(url, headers=headers, json=req)

import vertexai

vertexai.init(project="xxx-data", location="us-central1")

model = TextEmbeddingModel.from_pretrained("textembedding-gecko-multilingual@latest" )
task_type = 'SEMANTIC_SIMILARITY'

model.batch_predict(
    dataset="bq://xxx-data.job_results.articles_4_eval", 
    destination_uri_prefix="bq://xxx-data.job_results.articles_4_eval_vectors", 
    )

For both I am getting

Batch prediction job Gecko_embedding encountered the following errors:

Do not support publisher model textembedding-gecko-multilingual

Batch job for - textembedding-gecko-multilingual@latest on VertexAi from BigQuery table

Answers (1)

Related Questions