Test yaml great-expectations with Bigquery

Question

I am having troubles testing the yaml of great-expectation to bigquery. I followed the official documentation and got to this code

import os 
import great_expectations as ge 

datasource_yaml = """
name: my_bigquery_datasource
class_name: Datasource
execution_engine:
  class_name: SqlAlchemyExecutionEngine
  connection_string: bigquery:///
data_connectors:
  default_runtime_data_connector_name:
    class_name: RuntimeDataConnector
    batch_identifiers:
      - default_identifier_name
  default_inferred_data_connector_name:
    class_name: InferredAssetSqlDataConnector
    include_schema_name: true
"""
context = ge.get_context()

context.test_yaml_config(datasource_yaml)

The code works but it takes soo much time. I did deep debugging and see that the problem is that it wants to retrieve all the datasets of the project in bigquery and all the tables from all datasets. We have over 200 datasets and thousands of tables. I haven't found a way to filter the only dataset that i need or more specifically the table. I thought the connection_string should do it but doesn't.

In my deep debugging, and got to the inferred_asset_sql_data_connector.py module. I saw that it should filter the schema_name problem is that always comes as None. And don't know how to pass it as the dataset I want.

I followed this guide as well of introspection but getting other errors.

If I put the SimpleSqlalchemyDatasource as class_name I get the following error. And I dont know how to initalize the engine for bq in sqlalchemy in the context of greatexpectations.

Test yaml great-expectations with Bigquery

Answers (1)

Related Questions