Reputation: 281
I am trying to use Google Cloud Dataplex to manage a data lake with a bunch of BigQuery tables generated from files/folders in multiple GCS buckets. I have created a few custom Dataplex Catalog Aspect templates and have attached them to a few of the tables. My goal is to be able to programatically search and filter various datasets/tables/assets/whatever using the values of the various aspects I have attached.
My problem, however, is that when querying an Entry with the Google Cloud Python library, I can only see the names of my attached custom aspects, not the values.
Here is some simplified code, lightly adapted from the sample functions provided in the Google Cloud Github repository:
from google.cloud import dataplex_v1
def sample_get_entry(name):
client = dataplex_v1.CatalogServiceClient() # Create a client
request = dataplex_v1.GetEntryRequest(name=name) # Initialize request argument(s)
response = client.get_entry(request=request) # Make the request
print(response) # Handle the response
# location
sample_entry_name = "projects/<MYPROJECT>/locations/us-east1/entryGroups/@bigquery/entries/bigquery.googleapis.com/projects
/<MYPROJECT>/datasets/<MYDATASET>/tables/<MYTABLE>"
# get the entry info
sample_get_entry(sample_entry_name)
And here is simplified output:
name:
"projects/<MYPROJECT>/locations/us-east1/entryGroups/@bigquery/entries/bigquery.googleapis.com/projects
/<MYPROJECT>/datasets/<MYDATASET>/tables/<MYTABLE>"
entry_type: "projects/<NUMBER>/locations/global/entryTypes/bigquery-table"
create_time {
seconds: 1740156479
nanos: 220021000
}
update_time {
seconds: 1740170785
nanos: 551316000
}
aspects {
...
...
...
}
...
...
...
aspects {
key: "<NUMBER>.global.bigquery-table"
value {
aspect_type: "projects/<NUMBER>/locations/global/aspectTypes/bigquery-table"
create_time {
seconds: 1740156479
nanos: 220021000
}
update_time {
seconds: 1740156479
nanos: 220021000
}
data {
fields {
key: "type"
value {
string_value: "EXTERNAL_TABLE"
}
}
fields {
key: "tableType"
value {
string_value: "EXTERNAL"
}
}
fields {
key: "connectionId"
value {
string_value: ""
}
}
}
aspect_source {
create_time {
seconds: 1740156478
nanos: 531000000
}
update_time {
seconds: 1740156478
nanos: 855000000
}
data_version: "Ingestion/1.0.0"
}
}
}
aspects {
key: "<NUMBER>.us-east1.MYCUSTOM-aspect1"
value {
}
}
aspects {
key: "<NUMBER>.global.MYCUSTOM-aspect2"
value {
}
}
parent_entry:
...
...
As you can see, the Google-constructed aspects display their values, but the last two aspects listed are my custom aspects and although the name is displayed none of their values are displayed.
How can I access the values of aspects assigned to my data?
Upvotes: 0
Views: 28
Reputation: 328
Try to add to your Get entry python script the view since it determines which Aspects are returned with the Entry. Set your EntryView into “ALL” to return all aspects. If the number of aspects exceeds 100, the first 100 will be returned.
Upvotes: 1