Reputation: 1092
I have a dataset and a table in google big query (BQ). For the dataset, I can add description, and for the table I can add description and column policy tags to control column level access (I am ignoring the "Labels" and "Tags" that one can attach to any BQ resource).
Next, in Dataplex, I created a lake and a zone, and then attached the previous BQ dataset to the zone.
Then I searched for the BQ table in the "Search" page under the "Discover" page in Dataplex. 2 results come up, one with "System" as "BIGQUERY" and one with "System" as "DATAPLEX". When I select the 2 results, I find the following points:
What I understand is that the entry with System as BIGQUERY is Data Catalog metadata (the url contains the string ...entryGroups/@bigquery/entries/...) whereas the one with System as Dataplex is a Dataplex entry. Also, for the same table, I was able to add different metadata using Data Catalog and the Dataplex entry. The system is perfectly fine with it, the metadata from Data Catalog does not surface into the Dataplex entry and vice versa, and metadata from both does not surface in BQ UI.
Is the above behavior expected? Seems that there are 3 sources of metadata for the same table, one in BQ, one in Data Catalog, and one in the Dataplex entry, all independent of each other (albeit the Data Catalog and Dataplex metadata is a superset of the BQ metadata).
Upvotes: 0
Views: 900
Reputation: 1092
Detailed discussion here in the google cloud community forum: https://www.googlecloudcommunity.com/gc/Data-Analytics/Dataplex-metadata-vs-Data-catalog-metadata/td-p/746966
Upvotes: 0