Reputation: 1407
I am having troubles testing the yaml of great-expectation to bigquery. I followed the official documentation and got to this code
import os
import great_expectations as ge
datasource_yaml = """
name: my_bigquery_datasource
class_name: Datasource
execution_engine:
class_name: SqlAlchemyExecutionEngine
connection_string: bigquery://<GCP_PROJECT_NAME>/<BIGQUERY_DATASET>
data_connectors:
default_runtime_data_connector_name:
class_name: RuntimeDataConnector
batch_identifiers:
- default_identifier_name
default_inferred_data_connector_name:
class_name: InferredAssetSqlDataConnector
include_schema_name: true
"""
context = ge.get_context()
context.test_yaml_config(datasource_yaml)
The code works but it takes soo much time. I did deep debugging and see that the problem is that it wants to retrieve all the datasets of the project in bigquery and all the tables from all datasets. We have over 200 datasets and thousands of tables. I haven't found a way to filter the only dataset that i need or more specifically the table. I thought the connection_string should do it but doesn't.
In my deep debugging, and got to the inferred_asset_sql_data_connector.py
module. I saw that it should filter the schema_name problem is that always comes as None. And don't know how to pass it as the dataset I want.
I followed this guide as well of introspection but getting other errors.
If I put the SimpleSqlalchemyDatasource as class_name I get the following error. And I dont know how to initalize the engine for bq in sqlalchemy in the context of greatexpectations.
Upvotes: 2
Views: 1010
Reputation: 474
default_inferred_data_connector_name
tries to fetch all dataset and table info from bigquery and it will create assets. You can remove the default_inferred_data_connector_name and use
RuntimeBatchRequest
and use query to validate the data.
Regarding authentication issue you can change the
connection_string: bigquery://<GCP_PROJECT_NAME>/<BIGQUERY_DATASET>
to
connection_string: bigquery://<GCP_PROJECT_NAME>/<BIGQUERY_DATASET>?credentials_path=<path_to_credential file >
More info on sql alchemy configuration can be found at https://github.com/googleapis/python-bigquery-sqlalchemy
Upvotes: 2